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*The title relates both to the folklore story of a steel driv- 
ing man named John Henry dying with a hammer in his hand 
instead of losing to a steam drill [NclOS] and to a psycholo- 
gist Abraham Maslow stating that if the only tool you have 
is a hammer, it is tempting to treat everything as if it were a 
nail [Mas62]. 



1 Introduction 

The purpose of this report is documenting personal 
research resuhs of the year 2012 in a form primarily 
intended for assessment of their scientific merit as a 
foundation for future work, not for quantitative as- 
sessment of the resulting publication record. This can 
be considered as an aggressive form of self-archiving 
initiative [HarOl] where scientific and engineering con- 
tributions are not only logged, but also put in perspec- 
tive by a separate first class atomic scientific knowl- 
edge object. This report is mostly meant for my 
SWAT colleagues. However, it is open for broad au- 
dience and meant to be readable by any researcher 
with reasonable degree of familiarity with computer 
science. It can be consumed as a self-contained docu- 
ment, but many details are not pulled in from avail- 
able referenced sources. 

We start right away with a the overview of the field 
(§2.1) followed by brief descriptions of major (§2.2) 
and minor (§2.3) contributions, followed by a more 
elaborate motivation for creation of this document 
(§2.4). Next, all research topics are laid out in detail 
one by one (§3). For the sake of complexity, a sepa- 
rate overview of all involved venues (§4) is included. 
§5 concludes the report. 

2 Preliminaries 

2.1 Background notions 

Software language is a concept that generalises over 
programming languages, markup languages, database 
schemata, data structures, abstract data types, data 
types, modelHng languages, ontologies, etc. When- 
ever we observe some degree of commitment to struc- 
ture, we can identify it with a language, which ele- 
ments (symbols) can be separately defined and the al- 
lowed combinations of them can be somehow specified. 
Studying software language engineering is important 
because of possibly gained insights into relations be- 
tween the way such languages are defined and used in 
different technological spaces (e.g., we can study data 
binding as a way to map a relational database to an 
object model, or language convergence as a way to 
compare an XML schema with a syntax definition). 
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2 PRELIMINARIES 



Formal grammars is a long-existing approach of 
dealing with languages — originally context free 
grammars [Cho56] were mainly aimed at textual pro- 
gramming languages [ASU85] , but later other variants 
of grammars were proposed, including keyword gram- 
mars [GM77], indexed grammars [AI1068], lexicalised 
grammars [SAJ88], object grammars [SCL12], pattern 
grammars [Grc96], array grammars [SSK72], puzzle 
grammars [Niv-|-91], picture grammars [MS67], 
picture processing grammars [Cha70], tile gram- 
mars [RP05], grid grammars [Dre-fOl], motion 
picture grammars [BAF97], pair grammars [Pra71], 
triple graph grammars [Sch95], deterministic 
graph grammars [Cau07], string adjunct gram- 
mars [JKY69], head grammars [Pol84], tree adjunct 
grammars [JLT75], tree description grammars [KalOl], 
description tree grammars [RVSW95], description 
tree substitution grammars [RWVSOl], functional 
grammars [Luk77], Lukaszewicz universal gram- 
mars [Luk82], two level grammars [Wij74], van 
Wijngaarden grammars [Wij65], metamorphosis 
grammars [Col78], affix grammars [Kos91], extended 
affix grammars [Mei90] , attribute grammars [Knu90] , 
extended attribute grammars [WM83] , definite clause 
grammars [PW86], minimalist grammars [LROl], cat- 
egorial grammars [Ajd35], type grammars [Lam58], 
pregroup grammars [Lam08], Montague univer- 
sal grammars [Mon70], logic grammars [AD89], 
assumption grammars [DTL97], constraint han- 
dling grammars [Chr05], abductive logic gram- 
mars [CD09], simple transduction grammars [LS68], 
inversion transduction grammars [Wu97], range 
concatenation grammars [Bou98], island gram- 
mars [DK99], bridge grammars [NNEH09], skeleton 
grammars [KL03], permissive grammars [Kat+09], 
conjunctive grammars [OkhOl], Boolean gram- 
mars [Okh04], Peirce grammars [BotOl], transfor- 
mational grammars [DcR74], probabilistic gram- 
mars [Korll], notional grammars [And91], analytic 
grammars [For04], parsing schemata [Sik97], cooper- 
ating string grammar systems [CV-l-95], cooperating 
array grammar systems [DFP95], cooperating puzzle 
grammar systems [SSC06], etc^. A grammar of a 
software language, which specifies commitment to 
grammatical structure, is called a grammar in a 
broad sense [KLVOSa], even if in practice it defines 
a metamodel or an API, thus not officially being 
a grammar at all. The grammarware technological 
space is commonly perceived as mature and drained 
of any scientific challenge, but provides many un- 
solved problems for researchers who are active in that 
field. 

For the last years, and specifically in 2012, I have 
focused my efforts on using grammar-based techniques 
in the broad field of software language engineering. 



^The earliest possible reference is given for each variant, 
preferably from the programming language research field. 



2.2 Major contributions in a nutshell 

This section contains brief descriptions of the con- 
tributions of 2012 and some statements about their 
usability and/or importance. Sections that contain 
extended descriptions of the contributions with some 
level of technical detail, are referenced in parenthesis. 

Guided grammar convergence (§3.1). 

Grammar convergence is a lightweight verifica- 
tion method for establishing and maintaining the 
correspondence between grammar knowledge in- 
grained in various software artifacts [LZ09a] . The 
method entails programming grammar transfor- 
mation steps with a general purpose grammar 
transformation operator suite. It was acknowl- 
edged in [CamlO, p. 34] as "a product-line ap- 
proach to provide [...] an organised software 
structure" . Yet, the method had some weak sides 
that inspired further investigation. 

One of the biggest issues is maintenance of the 
grammar relationships. Once they have been 
established by programming grammar transfor- 
mation steps, it becomes very hard to coevolve 
these steps with eventual changes in the source 
grammars. An ideal solution would be a way 
to automatically reestablish grammar relation- 
ships based on declarative constraints. This way 
is guided grammar convergence: instead of pro- 
gramming the transformations, we construct an 
idealised "master grammar" that shows the most 
essential properties of all grammars that are to be 
converged, and the transformation steps are then 
derived automatically, guided by the structure of 
the master grammar. 

The transformation inference algorithm relies on 
the source grammars and their metasyntax. This 
method was prototyped twice: in Python and in 
Rascal, and tested successfully on 12 grammars 
in a broad sense obtained from different techno- 
logical spaces. It has not been properly pub- 
lished after being rejected three times [Zayl2h; 
Zayl2i; Zayl2j], but received encouraging feed- 
back from some of those venues and from one 
presentation [Zayl2g]. 

Grammar transformation (§3.2) 

Grammar convergence, evolution, maintenance 
and any other activity that deals with changes, 
can profit from expressing such changes in the 
functional way: every step is represented as a 
function application, where a function is a trans- 
formation operator such as rename or add. The 
latest of such operator suites has been developed 
in 2010 [Zay+08, XBGF Manual] and shown to 
be superior to its alternatives [LZll, §4]. 

During 2012, XBGF has been: 

• reimplemented in Rascal, which led to ex- 
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tensive testing and more systematic specifi- 
cation of operator semantics (§3.2.1); 

• extended for bidirectionality by pairing op- 
erators, introducing lacking ones and aban- 
doning unsuitable ones (§3.2.2); 

• experimentally extended for adaptability 
(§3.2.3); 

• extended by mining patterns of it usage 
(§3.2.4); 

• investigated for migration from the func- 
tional paradigm to the declarative one 
(§3.2.5). 

Each of these initiatives is a nontrivial project 
complete with conceptual motivation, pro- 
grammed prototypes and obtained results (posi- 
tive for the first three, controversial for the fourth 
and decisively negative for the last one). 

Metasyntactic experiments (§3.3) 

Metasyntax as a language in which grammars are 
specified, was a topic briefly touched in my PhD 
thesis [ZaylO], but never officially published. In 
2012, I finally dedicated enough time and atten- 
tion to engineer a proper prototype for metasyn- 
tax specifications (§3.3.1) and their transforma- 
tions (§3.3.2), as well as to perform a series of 
experiments on metasyntax-driven grammar re- 
covery (§3.3.3) and convergence (§3.3.4). This 
area has now been exhaustively covered, and the 
only possible future extensions must rely on go- 
ing way beyond textually specified context-free 
grammars. 

To be completely frank, it should be noted here 
that most of the experiments with metasyntax 
were done in the course of 2011 and were only 
polished, presented and published in 2012 (which 
still required considerable effort). 

Tolerant parsing overvievif (§3.4) 

Just like the grammar recovery paper came with 
an extensive related work section which listed all 
grammar recovery initiatives in the last decade or 
two [Zayl2x, §2], a new parsing algorithm that 
I tried to propose (§3.8.2) came with an exten- 
sive overview of all methods of tolerant pars- 
ing known to grammarware engineers up to date 
(§3.4). While the iterative parsing method was 
novel but ultimately dull and uninteresting, the 
overview itself was received very warmly during 
the presentation on it [Zayl2ag]. One of the re- 
viewers of [Zayl2n] has also advised to throw 
away the thing I thought was the main contribu- 
tion of the paper, and extend the thing I thought 
of as a byproduct, into a longer journal article. 
While surprising at first, this seems indeed like a 
reasonable course of action. 



2.3 Selected minor contributions 

In the following sections, I will present a detailed 
overview of major (§§3.1-3.7) and minor (§3.8) con- 
tributions, but the border between them is naturally 
flexible. Thus, in the previous section introduced only 
four of the best major ones, and this section will intro- 
duce several middleweight contributions ("less major" 
mixed with "not so minor" ones). 

Grammar mutation (§3.8.1) 

It has been noted in [Zayl2p; Zayl2r] that there 
is a separate group of grammar changes that 
reside between traditional grammar transforma- 
tions ( "rename X to Y" ) and the grammar trans- 
formation operators ("rename"), which was la- 
belled as a grammar mutation and formalised dif- 
ferently from them. While the only truly impor- 
tant property of grammar mutation in the con- 
text of [Zayl2p; Zayl2r] was that they are con- 
siderably harder to bidirectionalise, a lot of useful 
grammar manipulations like "rename all upper- 
case nonterminals to lowercase" or "eliminate all 
nonterminals unreachable from the root" belong 
to the class of mutations, so it deserves to be 
studied closer. In [Zayl2ah], I have composed 
a list of 16 mutations identifled in already pub- 
lished academic papers or in publicly available 
grammarware source code, but the paper was not 
accepted, so the topic remains only marginally 
explored. 

Iterative parsing (§3.8.2) 

Starting from a fresh yet weird topic of what 
"the cloud" can mean for grammarware engi- 
neering, I ended up proposing an algorithm for 
parsing in the cloud, which was not based on 
parallel parsing [Alb+94], but rather on island 
grammars [MooOl; KL03]. The whole topic is 
questionable and only suitable for a "wild ideas 
workshop" , as was nicely put by one of the re- 
viewers, but is still potentially of some interest. 
The paper containing the algorithm was rejected 
twice [Zayl2n; Zayl2o] so far, and requires invest- 
ing more time in empirical validation at least, in 
order to increase the chances of acceptance. 

Unparsing in a broad sense (§3.8.3) 

I could not help noticing that parsing (i.e., map- 
ping strings to graphs) receives much more re- 
search attention than the reverse process of un- 
parsing (i.e., mapping graphs to strings). How- 
ever, the only thing I did accomplish this year 
was to collect a couple of references on exist- 
ing research and make a "new ideas" extended 
abstract [Zayl2ai], which was classified as a "re- 
quest for discussion" and rejected. I am already 
prepared to give a discussion-provoking presen- 
tation on this topic, but it requires much more 
effort to be invested until more tangible results 
are obtained. 
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Figure 1: The results of 2012, accorciing to DBLP. 



Megamodelling (§3.5) 

Megamodelling is higher abstraction level form 
of modelling that is concerned with software lan- 
guages and technologies and relations between 
them. This year I have published some pa- 
pers with megamodels in them [Zayl2u; Zayl2v; 
Zayl2ab; Zayl2ac; Zayl2k] and touched upon 
the topic in a range of presentations [Zayl2s; 
Zayl2ad; Zayl2aa; Zayl2w]. Much more work on 
this topic is planned for 2013. 

Open notebook computer science (§3.7) 

Open notebook science is an open science 
paradigm of doing research in a transparent way. 
It is already a fairly widely accepted methodol- 
ogy in areas like chemistry [San08] and drug dis- 
covery [Sin08] and is generally perceived as the 
next big step after open access [LI0O8]. However, 
in computer science and software engineering it 
has never been a tradition to keep a lab note- 
book, and it takes quite some time to maintain it, 
with few apparently visible benefits. I have been 
experimenting quite a lot with this idea, but fi- 
nally decided to come out to a bigger public with 
two presentations in 2012 [Zayl2z; Zayl2af]. In 
general, I believe this is a reasonable idea, and 
I will keep practicing open notebook science my- 
self, but it will take quite some effort to put it 
carefully into words in order to publish, so I am 
not even sure it is feasible to expect a publication 
in 2013. 

2.4 Motivation for this report 

The progress of a scientist is traditionally measured by 
an outsider by the papers that the scientist produces. 
According to DBLP, the main supplier of bibliogra- 



phy lists currently, the year 2012 for me yielded the 
following results (see the screencapture on Figure 1): 
one journal paper [Zayl2r], one conference proceed- 
ings paper [Zayl2d], one preprint [Zayl2k]. However, 
the first one is an only slightly extended version of a 
workshop paper [Zayl2p] written mostly in 2011; the 
second one was written and accepted in 2011; and the 
third one was intended to be a supplementary material 
for another paper that is not yet accepted anywhere. 
Additionally, there are three more post-proceedings 
papers in print [Zayl2x; Zayl2ac; Zayl2v], which are 
already finished and submitted and will eventually ap- 
pear in the ACM Digital Library — when they do, 
they will also be listed at DBLP under 2012, but at 
that time it will be too late to write a year report. 

What about the self-archiving initiative [HarOl]? 
Luckily, I disclose relatively large amounts of dark 
data [Goe07] about my research activities, having an 
extensively linked daily updated website with an open 
notebook (see §3.7) and many generated lists, includ- 
ing the current publishing progress, as seen on the the 
screencapture on Figure 2. Even judging by the bare 
numbers, one can already tell that this list contains 
much more information than the DBLP list. However, 
it also has its problems: the "published" column con- 
tains the works of previous years that happened to be 
delayed enough for the post-proceedings to appear in 
January 2012 [FLZ12]; as well as mentions of drafts 
planned for future publication (easily localised in the 
last column). It also contains editorial work for non- 
mainstream venues [JZ12b; JZ12a] which is of much 
lesser relevance because there is no scientific value to 
it. What it does not contain, is relations between 
all these papers: obviously some papers are enhanced 
version of previously rejected drafts, but in order to 



5 




Current publishing progress 



Before I let the stemii drill heat m e down, 
I will die with the hammer in my hand. 





13 


12 




In prcgfi^iS 

7 


[pcst-HCN' 12] 


[l>rHPT'12] 


[SCP] 


[IFM'13] 


ipost-JCH' 12] 


tPOPL' 13] 




[MODELS' 13] 


[post-HPH' 12] 


[EHSE] 




[HoDELS' 13] 


[pra-WCN' 12] 


[FSE NIER'' 12] 




[JOTj 


[pre-XH'l2] 


[FSE NIER' 12] 




[ESEC 13] 


[p£e-HPK''12] 


[NOEdiCloud' 12] 




[Vani' 131 


[Coim] 


[SCAM' 12] 




[CoRR] 


[post-LDTA'lS] 


[ JUCSl 






[post-BX' 12] 


IICEH' 12] 






Ipre-LDTA' 12] 


[TFP' 12] 






[SAC 12] 


[ECMFA' 121 






[pre-BX'12] 


IVani' 12] 






tpo8t-SI£' 11] 









Figure 2: The results of 2012, according to the self-archiver. 



figure them out, one needs to read the open notebook 
at http://graiiimarware.iiet/opens or analyse it au- 
tomatically (no readily available tools are provided). 

Personally, I can state that guided grammar 
convergence (see §3.1) is my top result of the 
year. However, it has not (yet) been properly pub- 
lished. After being rejected at ECMFA [Zayl2h] 
and ICSM [Zayl2i], it received very positive reactions 
from POPL [Zayl2j], yet was also deemed not ma- 
ture enough for publication. Still, having to figure 
out what are the limits of the proposed methodology 
and how to describe it well, does not change the fact 
that this is my best contribution of the year 2012. 

Grammar transformation operator suites like 
XBGF (see §3.2.1), SBGF (§3.2.2), EXBGF (§3.2.4), 
ABGF (§3.2.5) and NBGF (§3.2.3) represent massive 
amounts of work, but they are not publishable by 
themselves, if at all. Still, each of them represents 
a milestone enabling further advances. Engineering 
work that supports scientific research, has rarely been 
explicitly noted and appreciated. 

Quoting [San08]: 'T/ie notebook is about publishing 
data as quickly as possible. The paper is about synthe- 
sizing knowledge from all those results. " Hence, this 
report is aimed at synthesizing knowledge about the 
experiments and achievements undertaken during the 
course of 2012 by me (possibly in collaboration with 
someone else) within the NWO project 612.001.007, 
"Foundations for a Grammar Laboratory". It holds 
the most value for myself and my project colleagues, 
but is also available for anyone interested in the topics 
discussed: unlike open notebook entries, this report is 
a proper atomic scientific knowledge object [Giu-|-10; 



Sim+11]. Only two topics directly relevant to the 
project, are not included: one must remain hidden 
according to the rules of the target venue, and for the 
other one the context and consequences are not yet 
understood enough even for such a lightweight pre- 
sentation. 

3 Topics overview 

3.1 Guided grammar convergence 

Let us consider two grammars in a broad sense 
[KLV05a]. We say that they represent one intended 
software language, if there exists a complete bidirec- 
tional mapping between language instances that com- 
mits to grammatical structure of different grammars. 
For example, if a parser produces parse trees that can 
always be converted to abstract syntax trees expected 
by a static analysis tool and back, it means that they 
represent the same intended language. As another ex- 
ample, consider an object model used in a tool that 
stores its objects in an external database (XML or 
relational): the existence of a bidirectional mapping 
between entries (trees or tables) in the database and 
the objects in memory, means that they represent the 
same intended language, even though they use very 
different ways to describe it. An equivalence class 
spawned by this definition (i.e., a set of different gram- 
mars of the same intended language) effectively forms 
a grammarware product line of products that perform 
different tasks on instances of the same intended lan- 
guage: in that sense, for example, all Java-based tools 
form a product line, if they agree on a language ver- 
sion and do not employ any highly permissive methods 
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that would shift them into a broader class. For the 
sake of simplicity, let us focus on grammar product 
lines: collections of grammars of the same intended 
language. The relation between a grammar product 
line and a grammarware product line is justified by 
research on automated derivation of grammar-based 
tools like parsers, environments, documentation, for- 
matters and renovators from grammars [Kli93; SV99; 
Jon02; KLV05a; Cam+10; ZLll]. 

Suppose that we have two grammars: one that we 
call a master grammar (a specially pre-constructed 
abstract grammar of the intended language) and one 
that we call a servant grammar (a grammar derived 
from a particular language implementation). In gen- 
eral, there are four phases of guided grammar con- 
vergence, and they are presented in this section in the 
reverse order. First, we consider the simplest scenario 
when all mismatches are of structural nature. Then, 
we move on to a more complicated situation when a 
nominal matching between sets of nonterminals is un- 
known. Since this is rather uncommon (most methods 
used in practice for imploding parse trees to abstract 
syntax trees, from Popart [Wil97] to Rascal [KSVll], 
heavily rely on equality of names), a new method for 
matching nonterminals has been developed. In short, 
it comprises construction of production signatures for 
each production rule in both grammars, and a search 
for equivalent and weakly equivalent production rules 
with respect to those signatures. Once a name resolu- 
tion relation has been successfully built, a previously 
discussed structural matching can be applied. We will 
also discuss normalisations that can transform any ar- 
bitrary grammar to a form easily consumable by our 
nominal and structural matching algorithms. Finally, 
I will list additional problems that indicate grammar 
design decisions and therefore not affected by normal- 
isations. However, I describe how to automatically 
detect such issues and to address them with grammar 
mutations. 

Structural matching 

Let us assume the simplest scenarios: the two input 
grammars have the same set of nonterminals; neither 
of them has terminals; the starting nonterminal is the 
same and that the sets of production rules arc differ- 
ent but have the same cardinality. These would be 
typical circumstances if, for example, the grammars 
define two alternative abstract syntaxes for the same 
intended language. 

We can start from the roots of both grammars and 
traverse them synchronously top-down, encountering 
only the following four circumstances: 

Perfect match. Convergence is trivially achieved. 

Nonterminal vs. value. By "values" I mean non- 
terminals that are built-in in the underlying 
framework (e.g., "string"). 



Production rule 


Production signature 


pi = {program — > function'^) 


{{function, +)} 


P2 = {function — > str str^ expr) 


{{expr, 1), {str, 1+)} 


pz = {expr — str) 


{{str, 1>} 


Pi={expr — > int) 


{{int, 1>} 


P5 = {expr — > apply) 


{{apply, 1)} 


ps = {expr — > binary) 


{{binary, 1)} 


PY={expr — >■ cond) 


{{cond, 1)} 


ps = {apply — > str expr^) 


{{expr, +), {str, 1)} 


pcj = {binary — expr operator expr) 


{{expr, 11), {operator, 1)} 


piO={cond — expr expr expr) 


{{expr. 111)} 



Table 1: Production rules of the master grammar for 
FL, with their production signatures. 



Sequence element permutations can 

matically detected and converged. 



be auto- 



Lists of symbols. Many frameworks that have com- 
ponents with grammatical knowledge, have a no- 
tion of a list or a repetition of symbols in their 
metalanguage. 

It can be shown that these four are the only possi- 
bilities, and that their resolution can be resolved. 

Nominal resolution 

In a more complicated scenario, let us consider the 
case of different nonterminal sets in two input gram- 
mars, and for simplicity we assume that all produc- 
tion rules are vertical (non-flat) and chained (if there 
is more than one production rule for the same nonter- 
minal, all of them are chain productions — i.e., have 
one nonterminal as their right hand side). Next, we 
define a footprint of a nonterminal in an expression as 
follows: 



{1} if a:: = n 

{?} i{x = n7 

{+} iix = n+ 

{*} ii X — n* 
IJ 7r„(e) if a; is a sequence L 

eG-L 

otherwise 



By extension, we define a footprint of a nonterminal 
in a production rule as a footprint of it in its right 
hand side: 

7r„(m e) = 7r„(e) 

Based on that, we define a production signature, 
or a prodsig, of a production rule, by collecting all 
footprints of all nonterminals encountered in its right 
hand side: 

a{p) = {(n, 7r„(e)) \ n£N, 7r„(e) 7^ 0} 

A good example of how production signatures look 
like, is to be found on Table 1. 



3.1 Guided grammar convergence 



7 



We say that two production rules are prodsig- 
equivalent, if and only if there is a unique match be- 
tween tuple ranges of their signatures: 

p-q ■^=^ V(n, tt) e a{p), 3l{m, ^) G a{q), tt = ^ 

Similarly, a weak prodsig-equivalence p ^ q is de- 
fined by dropping the uniqueness constraint and weak- 
ening the equality constraint in the last definition 
to footprint equivalence which disregards repetition 
kinds (+ is equivalent to *). Then it can be proven 
that for any two strongly prodsig-equivalent produc- 
tion rules p and q, p — q, a. nominal resolution rela- 
tionship has the form of: 



poq = (y{p) o a{q) 

where pi o p2 is a composition of two relations in 
the classic sense and p is the classic inverse of a rela- 
tion. Moreover, for any two weakly prodsig-equivalent 
production rules p and q, p o q, there is (at least one) 
nominal resolution relationship po q that satisfies the 
following: 

V(a, b) 'Epoq:a^uj\/b = uj\/ 
3tt, 3^, TT « f , (a, tt) e a{p), {b, £,) e a{q) 

and 

V(a, b) (z po q, V(c, d) (^poq: a = c^b = d 

Where oj is used to explicitly denote unmatched 
nonterminals. 

Abstract Normal Form 

In order to fit any grammar into the conditions re- 
quired by the previously described matching tech- 
niques, we demand the following normalisation: 

1 . lack of labels for production rules 

2. lack of named subexpressions 

3. lack of terminal symbols 

4. maximal outward factoring of inner choices 

5. lack of horizontal production rules 

6. lack of separator lists 

7. lack of trivially defined nonterminals (with a, e 
or ip) 

8. no mixing of chain and non-chain production 
rules 

9. the nonterminal call graph is connected, and its 
top nonterminals are the starting symbols of the 
grammar 

It can be shown that transforming any grammar 
into its Abstract Normal Form is in fact a grammar 
mutation (see §3.8.1). In the prototype, I have imple- 
mented it to effectively generate bidirectional gram- 
mar transformation steps, so the normalisation pre- 
serves any information that it needs to abstract from. 



Grammar design mutation 

Some grammar design smells (terminology per 
[Stol2a]) hke yaccification (per [SV99; BSV98]) or lay- 
ered expressions (per [LZ09a] ) have shown to be per- 
sistent enough to survive all normalisations and cause 
problems for establishing nominal and structural map- 
pings. They can be identified and dealt with by au- 
tomated analyses and mutations, but so far I have to 
proof that they are the only possible obstacles, and 
no guarantees about any other smells problematic for 
guided grammar convergence. 

3.1.1 Generalisation of production signatures 

The method of establishing nonterminal mappings of 
different grammars of the same intended language, 
can be generalised as follows. Suppose that we have 
a metalanguage. Without loss of generality, let us as- 
sume that each grammar definition construct that is 
present in it, can be referred to by a single symbol: 
'S" , "?" , "*" , etc and uses prefix notation. This meta- 
syntactic alphabet A will form the foundation of our 
footprints and signatures. Let us also assume that all 
metasymbols are unary or are encoded as unary, ex- 
cept for two composition constructs: a sequential 'S" 
and an alternative "|", which take a list of symbols. 

Then, a footprint of any nonterminal n in an ex- 
pression a; is a multiset of metasymbols that are used 
for occurrences of n within x: 



TTn(x) 



f{l} 

U ^nie) 

eeL 



if x = n 

if x = fi{n), /i G A 
if a; = ,{L) 

otherwise, also if x = \ {L) 



Our previously given definition of a production sig- 
nature can still be used with this generally redefined 
footprints. 

It is well known that language equivalence is unde- 
cidable. Any formulation of the grammar equivalence 
problem, that is based on language equivalence, is 
thus also undecidable. Grammar convergence [LZ09a; 
LZll] is a practically reformulated grammar equiva- 
lence problem that uses automated grammar trans- 
formation steps programmed by a human expert. By 
using these generalised metasyntactic signatures, we 
can infer converging transformation steps automati- 
cally, thus eliminating the weakest link of the present 
methodology. However, this is not the only applica- 
tion of the generalisation. 

The most trivial use of metasyntactic footprints 
and signatures would lie in grammarware metrics. 
Research on software metrics applied to context-free 
grammars has never been an extremely popular topic, 
but it did receive some attention in the 1970s [Gru71], 
1980s [Kel81] and even recently [PM04; Cre-f 10]. Us- 
ing quantitative aspects of metasyntactic footprints 
and signatures (numbers of different footprints within 
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the grammar, statistics on them, etc) is possible and 
conceptuaUy akin to using micropatterns [GM05] and 
nanopatterns [BatlO], but nothing of this kind has 
ever been done for grammars (in a broad sense or oth- 
erwise) . 

A different more advanced apphcation of metasyn- 
tactic footprints and signatures is the analysis of their 
usage by mining existing grammar repositories like 
Grammar Zoo [Zay+08]. This can lead to not only 
improving the quality of the grammars by increasing 
their utilisation of the metalanguage functionality, but 
also to validation of metalanguage design. The whole 
programming language community uses dialects and 
variations of BNF [Bac60] and EBNF [Wir77], but 
their design has never been formally verified. How- 
ever, one may expect that introducing EBNF ele- 
ments like symbol repetition to BNF can be justified 
by analysing plain BNF grammars and finding many 
occurrences of encoding them ( "yaccification" , etc). 
It will also be interesting to see what new features 
the EBNF lacks practically — none of the existing 
proposals so far (ABNF [Ove05], TBNF [Man06], etc) 
were ever formally validated. 

3.1.2 History of attempted publication 

Initially, the idea of guided grammar convergence has 
emerged as a contribution for ECMFA [Zayl2h]. The 
level of contribution was praised by the reviewers, but 
the paper itself was deemed inappropriate for a heav- 
ily model-related venue. A bit later it was resubmit- 
ted after minor revision to ICSM [Zayl2i], where it 
was received even colder, presumably because the re- 
viewers were seeking a more practical side which was 
not demonstrated well enough. After much more ef- 
fort put into experiments, prototypes, auxiliary ma- 
terial [Zayl2k] and a complete rewrite of the paper 
itself, the method was submitted to POPL [Zayl2j]. 
It was unanimously rejected, but with very construc- 
tive and encouraging reviews. In 2013, they will be 
taken into account when the paper will be submitted 
again (the last time as a conference paper — other- 
wise I will admit it to be impossible for me to explain 
this method within the common limitations and go for 
a much longer self-contained journal submission). 

In [Zayl2m], I have attempted to sell the very act 
of validating the new method of guided grammar con- 
vergence by letting it cover the older case study done 
with contemporary grammar convergence, as a some 
sort of experimental replication in a broad sense. The 
reviewers praised the nonconformism and originality 
of the approach, and rejected the paper. 

The generalisation of the method was proposed as 
an extended abstract to NWPT [Zayl2t], where the 
reviewers did not see any merit in it (which I person- 
ally found strange since both ICSM and POPL review- 
ers insisted that various components of the method 
like ANF and prodsigs must be treasured as stan- 
dalone contributions which applicability is much wider 



than the automated convergence of grammars). Ei- 
ther my way of explaining was bad enough to obfus- 
cate this point, or I have terribly misunderstood their 
call for papers. 

3.2 Grammar transformation languages 

3.2.1 XBGF 

XBGF, standing for Transformation of BNF-like 
Grammar Format, is a domain-specific language for 
automated programmable operator-based transforma- 
tions of grammars in a broad sense. It has been previ- 
ously implemented in Prolog (which was mostly done 
by Ralf Lammel) and published as a part of a journal 
article [LZll, §4], as well as a separate online man- 
ual [Zay+08, XBGF Manual] — in fact, just a byprod- 
uct of the research on language documentation [ZLll]. 

XBGF is essentially finished work: it is working, it 
is useful for experiments, it has documentation, it has 
a test suite, etc. The only thing that was added in 
the course of 2012 is the reimplementation of XBGF 
in Rascal [KSVll]. Beside some metaprogramming, 
this reimplementation led to streamlining some of the 
applicability preconditions and postcondition, which 
could be viewed as a very minor scientific contribu- 
tion. 

3.2.2 EBGF 

If XBGF was read as "iks bee gee eff", then SBGF 
is "ksee bee gee eff", its bidirectional counterpart. 
Inspired by the call for papers of BX'12 (The First 
Workshop on Bidirectional Transformations, see §4.1), 
I was experimenting with bidirectionality in the gram- 
marware technological space, and this language is 
what came out of it. 80% of the work for creating 
it involved trivial coupling of grammar transforma- 
tion operators like chain and unchain, but the re- 
maining 20% have provided a lot of fuel for think- 
ing about what seemed to be a polished and finished 
product. EBGF was published as a part of online 
pre- proceedings [Zayl2p], and then, after the second 
round of reviews, as a journal article [Zayl2r]. The 
only problem was that the BX paper took off on its 
own, so the bidirectional grammar transformation op- 
erator suite seems like one of many byproducts there. 
There was a failed attempt to craft a paper that would 
be more focused on EBGF (and other aspects of gram- 
mar transformation not covered sufficiently by the BX 
submission), but a wrong venue was targeted, which 
resulted in desk rejection [Zayl2ah]. 

3.2.3 NXBGF? 

Another property of programmable grammar transfor- 
mations that always bothered me, was their rigidity: 
once written, they are hard to maintain and adapt, 
and one little change in the original grammar (for ex- 
ample, when the extractor is changed) can unexpect- 
edly and unpredictably break (make defunct) some of 
the transformation steps much later in the chain, and 
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there is no method available to detect the change im- 
pact. Analysing this problem led to an idea that was 
originally in preparation for the FM+AM workshop 
(see §4.2), but was not ready before the deadline, so 
it went to the Extreme Modelling Workshop instead, 
where it received surprisingly warm reaction. 

The idea is: negotiations. Whenever an error arises 
(usually an applicability condition is not met), instead 
of failing the whole chain, try to recover by negotiat- 
ing the outcome with the data about near-failure and 
some external entity (usually an oracle or a human 
operator). For example, when we want to rename a 
nonterminal that does not exist, the transformation 
engine may seek nonterminals with names similar to 
the required one, and try renaming them. 

The idea of negotiated grammar transformations 
was published in the online proceedings [Zayl2u] and 
then in the ACM Digital Library [Zayl2v], after which 
I was invited to submit an extended version to a jour- 
nal. This will soon lead to a prototype implementa- 
tion of such a system and perhaps to some interest- 
ing experiments with it. If this advancement yields a 
yet another grammar transformation operator suite, 
it may or may not be named "NXBGF" . 



3.2.4 EXBGF 

Considerations about the state of XBGF led me to 
start cursory reexamination of the available trans- 
formation scripts. The Java case study undertaken 
in 2009-2010 and published as a conference pa- 
per [LZ09b], a journal paper [LZll] and open source 
repository [Zay+08] , provided me with plenty of them. 
Manual ad hoc pattern recognition has resulted in 
development of a new operator suite, with higher 
order operators such as exhgf:pull-out, which would 
be equivalent to a superposition of xbgf:horizontal, 
xbgf:factor, xbgpextract and xbgf:vertical. As shown 
on Table 2, size metrics show a drop of 23-26% in 
Extended XBGF with respect to XBGF, but also the 
complexity was obviously decreased. However, the re- 
sults were not extremely convincing and lacked real 
strength since only a few uses per high level opera- 
tor were found, and the new EXBGF language was 
not designed systematically. Besides all that, the case 
study I have done, is, strictly speaking, about refac- 
toring XBGF scripts to Extended XBGF, so claims 
about usefulness of EXBGF for creating new trans- 
formation scripts, should be stated with caution. 

EXBGF was first described as an idea as a part of 
[Zayl2ah]. After its rejection, it was developed further 
and laid out in much more detail in a journal sub- 
mission, which was also eventually rejected [Zayl2m]. 
The fact that I presented Extended XBGF first as a 
"trend" and then as an "experiment", perfectly re- 
flects my point of view that it is not a solid contribu- 
tion on its own. 



3.2.5 ABGF? 

If there was one good outcome of getting a gram- 
mar transformation paper [Zayl2ah] rejected at a 
functional programming conference, then this is it: 
I started contemplating how to specify them in a 
non-so-functional way. Having recently been to a 
bidirectional transformations workshop helped, and I 
started researching tridirectional transformations (in 
fact, they quickly turned multidirectional). The idea 
was clean and simple: do not specify grammar changes 
as functions; instead, specify them as predicates. Such 
a predicate would, for example, introduce a nominal 
binding between nonterminals in different grammars 
— after which, the actual renaming steps can be easily 
inferred from such a binding predicate. 

Unfortunately, this idea was so beautiful in the- 
ory, but proven nearly impossible in practice (or in 
detailed theory, for that matter). The main problem 
lies with the order of execution: a functional grammar 
transformation script specifies that order naturally, 
while a list of predicates does not. As I found out the 
hard way, my prototypes were still clean and beautiful 
when they dealt with one transformation step; reason- 
able tricks and extensions could let me go up to three 
steps; beyond that some serious redesign was needed; 
and so far I have not figured out how to overcome this. 



3.3 Metasyntax 

Whenever we have a software language, we can speak 
of its syntax as a way it allows and disallows struc- 
tural combinations of elements: programming lan- 
guages rely on keywords and possibly layout conven- 
tions; spreadsheets have ways of distinguishing be- 
tween cells and referring to one from another; markup 
languages have symbol sequences of special meaning; 
musical notes are arranged on a grid; graphs must 
have uniquely identifiable nodes and edges connect- 
ing exactly two each; etc. Then, a metasyntax is a 
way of specifying this syntax. In the classic program- 
ming language theory, languages are textual and can 
be processed as sequences of lexems, and the meta- 
syntax is Backus Normal Form [Bac60], also called 
Backus Naur Form [Knu64], or its enhanced variant 
Extended Backus Naur Form [Wir77] . Despite the fact 
that EBNF has been standardised by ISO [IS096], 
there is no agreement in the software language en- 
gineering community on the exact variant of EBNF: 
some people just prefer using ":=" or "=" instead of 
"=" for esthetic reasons or prefer separating produc- 
tion rules with double newlines for readability reasons 
and for the sake of easy processing. 

The idea was hinted in my PhD thesis in 
2010 [ZaylO], completely worked out in 2011 and was 
put to several good uses in 2012. These are listed in 
the following subsections. 
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jlsl 


jls2 


jls3 


jlsl2 


jlsl23 


rl2 


rl23 


Total 


XBGF, LOG 


682 


6774 


10721 


5114 


2847 


1639 


3082 


30859 


EXBGF, LOG 


399 


5509 


7524 


3835 


2532 


1195 


2750 


23744 




-42% 


-19% 


-30% 


-25% 


-11% 


-27% 


-11% 


-23% 


genXBGF, LOG 


516 


5851 


9317 


4548 


2596 


1331 


2667 


26826 




-24% 


-14% 


-13% 


-11% 


-9% 


-19% 


-13% 


-13% 


XBGF, nodes 


309 


3,433 


5,478 


2,699 


1,540 


786 


1,606 


15851 


EXBGF, nodes 


177 


2,726 


3,648 


1,962 


1,377 


558 


1,446 


11894 




-43% 


-21% 


-33% 


-27% 


-11% 


-29% 


-10% 


-25% 


genXBGF, nodes 


326 


3,502 


5,576 


2,726 


1,542 


798 


1,610 


16080 




+6% 


+2% 


+2% 


+1% 


+0.1% 


+2% 


+0.3% 


+1% 


XBGF, steps 


67 


387 


544 


290 


111 


77 


135 


1611 


EXBGF, steps 


42 


275 


398 


214 


98 


50 


120 


1197 


...pure EXBGF 


27 


104 


162 


80 


30 


34 


44 




...just XBGF 


15 


171 


236 


134 


68 


16 


76 






-37% 


-29% 


-27% 


-26% 


-12% 


-35% 


-11% 


-26% 


genXBGF, steps 


73 


390 


555 


296 


112 


83 


139 


1648 




+9% 


+1% 


+2% 


+2% 


+1% 


+8% 


+2% 


+2% 



Table 2: Size measurements of the Java grammar convergence case study, done in XBGF and in EXBGF. In the 
table, XBGF refers to the original transformation scripts, EXBGF to the transformations in Extended XBGF, 
genXBGF measures XBGF scripts generated from EXBGF. LOG means lines of code, calculated with wc -1; 
nodes represent the number of nodes in the XML tree, calculated by XPath; steps are nodes that correspond 
to transformation operators and not to their arguments. Percentages are calculated against the XBGF scripts 
of the original study. 



3.3.1 Notation specification 

The first step in treating metalanguages as first class 
entities is, of course, encapsulating a particular met- 
alanguage with a specification that defines it. By ex- 
tending the list of possible metasymbols from the ISO 
EBNF standard [IS096] and by reusing the empir- 
ically constructed Table 6.1 from my thesis [ZaylO, 
p. 135], I was able to construct such a specification, 
which was subsequently named FDD, for EBNF Di- 
alect Definition. It was then turned into a small nicely 
packaged paper for the PL track of SAC [Zayl2d] — 
the very fact that it was published separately, gave 
me a lot of freedom later, when I did not feel like I 
need to introduce all the metasymbols all over again 
in each work that followed. 

3.3.2 Transforming metasyntaxes 

Once you have a notation specification as a first class 
entity, you can define transformations on them. This 
was probably the first transformation language that 
I have designed, where the main complexity was not 
in defining the transformation operators as such, but 
rather in coupling them with the grammar transfor- 
mation steps that they imply. The transformation 
suite consisted of just three operators: 

rename-metasymbol(s, Wi, D2) where s is the meta- 
symbol and values Ui and V2 axe strings 
For example, we can decide to update the nota- 
tion specification from using ":" as a defining 
metasymbol to using ": :=". This is the most 
trivial transformation, but also bidirectional by 



nature. 

introduce-metasymbol(s, v) where s is the meta- 
symbol and V is its desired string value 
For example, a syntactic notation can exist with- 
out terminator metasymbol, and we may want to 
introduce one. 

eliminate-metasymbol(s, v) where s is the metasym- 
bol and V is its current string value 
Naturally, eliminate and introduce together form 
a bidirectional pair. Specifying the current value 
of a metasymbol is not necessary, but enables ex- 
tra validation, as well as trivial bidirectionalisa- 
tion. 

Yet, the final megamodel of the infrastructure that 
did not even consider language instances (only gram- 
mars and metasyntaxes) looked as complex as Fig- 
ure 3. The paper about evolution of metalanguages 
had a bidirectionality flavour and was conditionally 
accepted at the BX workshop [Zayl2p], and then also 
for the journal special issue [Zayl2r]. 

3.3.3 Notation-parametric grammar recovery 

In all previously published grammar recovery initia- 
tives [BSV97; LV99; SVOO; LVOla; LVOlb; Lam05; 
Zay05; LZ09a; ZaylO; LZll; Zayllb; Zayl2ae] the 
step of transforming the raw grammar-containing 
text obtained from the language manual was ei- 
ther not automated (the grammar was re-typed from 
scratch in the notation required by the target gram- 
marware framework), or semi-automated (comprised 
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Figure 3: Components of a notation evolution: cr, a bidirectional notation specification transformation that 
changes the notation itself; S, a convergence relationship that can transform the notation grammars; 7, a 
bidirectional grammar adaptation that prepares a beautified readable version of N' . fi, an unidirectional coupled 
grammar mutation that migrates the grammarbase according to notation changes; possibly fi' , an unidirectional 
coupled grammar mutation that migrates the grammarbase according to the inverse of the intended notation 
changes. 



many rounds of test-driven improvement), or auto- 
mated with a throwaway tool (one that can not be 
reused unless the replication deploys exactly the same 
EBNF dialect). Having a notation specification as 
a first class entity, we can step up from throwaway 
tools to throwaway notation specifications: at least 
they take minutes to create, not days. 

Notation-parametric grammar recovery [Zayl2y; 
Zayl2x] was my best result of 2011, and this year it 
was officially published and put to several good uses. 
These uses are not exactly publishable simply because 
grammar recovery from (nearly) well-formed has be- 
come a trivial process itself, but there was one story 
that was enabled by this triviality. The grammar of 
MediaWiki syntax, for recovery of which I have a pre- 
viously exposed preprint [Zayllb], is a unique case 
of using multiple notations within one community- 
created grammar. With any other recovery method, 
it would have been easier to just retype the grammar 
again in a uniform fashion, but notation-parametric 
grammar recovery allowed to treat all six different in- 
coherent metalanguages with relative ease and derive 
the final grammar from the inconsistent input. A con- 
tinuation of this topic was intended to be a published 
closure on the case of MediaWiki grammar recovery, 
but was unfortunately rejected in the end [Zayl2ac]. 

3.3.4 Notation-driven grammar convergence 

Grammar convergence was originally a lightweight 
verification method not intended for full automa- 
tion [LZ09a]. However, seeing how many transforma- 
tions that were in fact converging grammars, it was 
possible to infer automatically for the metalanguage 



evolution case study [Zayl2t] (see also §3.3.2), I could 
not help starting to wonder whether and to what ex- 
tent it was possible to drive the automated conver- 
gence process by the notation properties. The result 
of that was the methodology of guided grammar con- 
vergence, which was already covered by §3.1. 

3.4 Tolerance in parsing 

Originally, the "parsing in the cloud" paper [Zayl2n; 
Zayl2o] was intended to present a useful crossing of 
the in-the-cloud and as-a-service paradigm with the 
engineering discipline for grammarware. However, 
the related work digging quickly got out of hand and 
turned into a contribution of its own. The overview 
of many grammar-based techniques with some level 
of tolerance towards their input data and its weak 
commitment to grammatical structure, was presented 
at the PEM Colloquium [Zayl2ag] (see also §3.8.8), 
where it was received very warm acceptance and led to 
many useful insights. It has been advised to me both 
by reviewers and colleagues to put more effort into 
demonstrative prototype and publish the overview 
with them separately from the parsing algorithm (see 
§3.8.2) itself. This is among one of the planned activ- 
ities for 2013. 

So far, at least the following tolerant pars- 
ing methods have been identified: ad hoc lexical 
analysis [BSVOO; KLVOSb], hierarchical lexical 
analysis [MN95], iterative lexical analysis [Cox03], 
fuzzy parsing [Kop97], parsing incomplete sen- 
tences [Lan88], island grammars [DK99], lake gram- 
mars [MooOl], robust multilingual parsing [SCD03], 
gap parsing [BN05], bridge grammars [NNEH09], 
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skeleton grammars [KL03], breadth- first pars- 
ing [Lce67; Oph97], grammar relaxation [ASU85], ag- 
ile parsing [Dca+03], permissive grammars [Kat-|-09], 
hierarchical error repair [BH82] , panic mode [ASU85] , 
noncorrecting error recovery [Ric85], precise pars- 
ing [AU72]. It remains to be seen whether they form 
a straight spectrum from lexical analysis to strict 
syntactic analysis. 

3.5 Megamodelling 

In computer science, modelling happens when a real 
artefact is represented by its abstraction, which is 
then called a model; metamodelling happens when the 
structure of such models is analysed and expressed 
as a model for models, or a metamodel; and meg- 
amodelling happens when the infrastructure itself, in- 
volving multiple models and metamodels, is modelled. 
The need for megamodels is being advocated at least 
since 2004 [BJV04; FN04]. 

The current state of the art is: in the simplest 
cases, people do not need a special formalism to state 
that, for example, "models A and B conform to the 
metamodel C" ; in somewhat more complicated sce- 
narios scientists and engineers tend to develop their 
own domain-specific ad hoc megamodelling method- 
ologies and employ them in narrow domains; and in 
truly complex situations, any existing approach only 
adds to complexity, overwhelming stakeholders with 
a yet another view on the system architecture. How- 
ever, at least one solid business case was found for 
megamodelling: the problem of comparing different 
technological spaces [KBA02]: for example, compar- 
ing the relations between XML documents, schemata, 
data models and validators, with relations between 
object models, source code and compilers. 

At the University of Koblenz-Landau, the Software 
Languages Team is dedicated to develop a general pur- 
pose megamodelling language called MegaL [FLV12]. 
After attending presentations about MegaL on sev- 
eral occasions, I have paid a working visit to them 
in July. The consequences of that visit: I tried to 
use MegaL for my own megamodelling needs on sev- 
eral occasions [Zayl2u; Zayl2v; Zayl2k], 1 have pre- 
sented an extensive overview of currently existing ad 
hoc megamodelling techniques (see §3.5.1), and I have 
proposed my own method of dealing with overly com- 
plex megamodels (see §3.5.2). 

3.5.1 MegaL dissection 

So far at least these previously existing ad hoc 
megamodelling approaches have been spot- 
ted: ATL [Jou+08], UNCOL [Bra61; Con58], 
tombstone [MHW70], grammarware megamod- 
elling [KLV05a], software evolution megamod- 
elling [FN04], evolution of software architec- 
tures [Gra07a; Gra07b], MEGAF [Hil+10], global 
model management [Vig+11], grammar conver- 
gence [LZ09a; ZaylO; Zaylla], software language 



engineering [ZaylO; Zay+08], modelling language 
evolution framework [MVll], metasyntactic evolu- 
tion [Zayl2p; Zayl2r]. 

My superficial overview of them, comparing them 
with MegaL, was presented to the MegaL designers 
in July [Zayl2s], and my current research activities 
include active collaboration with them with a paper 
presenting a unified model for megamodelling in mind. 

3.5.2 Renar rating megamodels 

Having seen enough presentations on megamodelling 
made me realise that they are very easy to follow even 
for untrained people, unlike the resulting megamodels 
that contain far too much detail and are very intimi- 
dating. So, my take on this problem was introducing 
two operations: slicing (to make megamodels smaller) 
and narrating (to traverse the elements in the meg- 
amodel). If we have them, we can take the baseline 
megamodel that only experts can try to understand, 
and cut it to consumable chunks bundled with the 
story that introduces the remaining elements one by 
one and explaining each step. The resulting paper 
was sent to a workshop on Multiparadigm Modelling, 
where it was presented as a poster [Zayl2aa], pub- 
lished in online pre-proceedings [Zayl2ab] and is cur- 
rently on its way to the post-proceedings in the ACM 
DL [Zayl2ac]. 

3.6 Grammar repository 

My first project proposal ever, titled "Automated 
Reuse-driven Grammar Restructuring" , was sent to 
the NWO Veni program in January, passed a rebut- 
tal phase in May and was finally rejected in July af- 
ter informing me that it ended up in the category 
"very good" [Zayl2a]. The idea described there was 
small and elegant: mining grammarware repositories. 
While repository mining techniques receive quite some 
attention nowadays, very few people actually have en- 
tire repositories filled with grammars: let's face it, 
they are omnipresent yet at the same time scarce. 
However, I already have this initiative called Gram- 
mar Zoo [Zay+08] , which contains many grammars of 
languages big and small, and armed with the arsenal 
of extraction tools developed in my PhD time, it can 
grow even more. The goal of such mining is, of course, 
to reverse engineer reusable grammar fragments and 
forward engineer the discipline of their composition. 

A paper advocating the need and the usefulness of 
the repository itself, was written and submitted to a 
journal in November [Zayl3]. The outcome will only 
become known in 2013. 

3.7 (Open) Notebook Science 

Open Notebook Science is an open science paradigm 
of doing research in a transparent way [LI0O8] . It in- 
volves keeping a lab notebook that collects all data 
and metadata on experiments, hypotheses, results, 
details and other observations that occur during the 
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research phase, so that after the final objective is 
reached (or deemed unreachable), the complete path 
towards it can be exposed and made publicly avail- 
able for inspection, replication and reuse. The open 
notebook approach is fairly well-known and somewhat 
popular in fields like biology and chemistry [San08; 
Sin08], that strive on experimental frameworks and 
traditionally involve lab notebooks, so in practice ex- 
ercising this approach had the only consequence of 
sharing the already existing notebook and systemat- 
ically referring to it from the papers. In computer 
science, however, there are none to few adopters of 
this approach, mainly due to the seeming complexity 
of the method and the amount of extra effort that is 
needed to set up and to maintain such a lab notebook 
and the lack of positive feedback from it in the form 
of community encouragement and peer acknowledge- 
ment. 

During the Software Freedom Day, I have given a 
presentation, explaining one possible feasible way to 
start practicing open notebook science for computer 
science and software engineering researchers, with the 
case study of myself [Zayl2z]. A couple of days later 
SL(E)BOK organisers have heard about it and asked 
me to record a keynote presentation [Zayl2af] about 
that, linking open access ideas with the existing re- 
search on "scientific knowledge objects" (SKO) and 
on a "body of knowledge" [Giu+10; Sim+11]. 

In short, open notebook science strives to enable 
open access to atomic SKOs; to expose all the dark 
data [Goe07] from failed experiments and unpublished 
results; to self-archive [HarOl] subatomic SKOs, which 
are relevant for the final result, but smaller than a 
"publon" . Examples of subatomic SKOs include: 

• Commits to an open source repository; 

• Tweets on work-related subjects; 

• Quora answers on work-related topics; 

• Papers: preprints, reports, drafts, etc; 

• Presentations: slides, screencasts, etc; 

• Blog posts; 

• Wiki edits; 

• Exposed tools; 

• Documentation; 

• Shared raw data; 

• Auxiliary material. 

As it has been pointed out to me by some of the 
attendees of both talks, the topic of subatomic SKOs 
is bigger than just open notebook computer science, 
because if I can show the usefulness of keeping a note- 
book of actions for a researcher, it does not neces- 
sary mean that the notebook must be public to profit 
from its traceability. The first comprehensive paper 
on this topic is still in the process of being designed, 
but hopefully will be submitted somewhere during the 
next year or two. 



3.8 Minor topics 

Additionally to the topics and achievements I consider 
major for 2012, there are several lesser contributions: 
their are either topics that did not receive enough at- 
tention to yield a solid major contribution (yet not 
insignificant enough to be omitted from the report 
completely); or just not traditionally considered wor- 
thy of mentioning (programming, engineering, organ- 
ising effort). 

One topic is intentionally hidden from this section, 
in order to prevent jeopardising an upcoming submis- 
sion to a strictly double blind peer reviewed venue. 

3.8.1 Grammar mutation 

In the paradigm of programmable grammar transfor- 
mations, the semantics of each of the transformation 
operators is bound to the operator itself, and may 
require arguments to be provided before the actual 
input grammar. Such partially evaluated operators 
(with all arguments provided, but no input grammar 
yet) are treated as transformation steps, and their ap- 
plicability constraints only depend on the grammar: 
if they hold, the change takes place; if they do not, 
an error occurs instead. In other words, the exact 
consequence of the transformation step depends on 
operands, not on the grammar. However, those ap- 
plicability constraints can also be processed as fil- 
ters: whatever part of the grammar satisfies them, 
will be transformed — that way, the exact change in 
the grammar depends on the grammar, not on the 
operands. 

As an example, consider renaming grammatical 
symbols: "rename nonterminal" itself is an operator. 
Its semantics can be expressed easily on the classic 
definition of a grammar. If the input grammar is 
G = (N, T, P, 5), then the output must be 

G' ^{Nn{x}U{y},T,PU^y,S') 

where x and y are operands; S' is S unless S = x 
and y otherwise; and Aj^^-j.^ means substitution (for 
example, by term rewriting). When x and y are pro- 
vided, then G' above becomes fully defined and yields 
meaningful results when applicability conditions (e.g., 
X € N and y ^ N) are satisfied. Renaming a terminal 
symbol is specified similarly. 

However, "renaming all lowercase nonterminals to 
uppercase" is not an operator (or at least even it is 
made one, it will be of much higher level than the 
simple "rename"), and it is not an atomic transfor- 
mation step either: in fact, it can lead to any number 
of changes in the grammar from to |N| , depending on 
G. This number absolutely cannot be known before 
G is provided. 

This kind of grammar manipulation was identified 
first as a part of research on bidirectional transfor- 
mations [Zayl2p; Zayl2r; Zayl2q; Zayl2c] (because 
they are not bidirectionalisable), where it received the 
name of "grammar mutation" . Later there was an 
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endeavour to compose a comprehensive list of useful 
grammar mutations as a part of [Zayl2ah], but it was 
rejected. 

For the sake of providing a better overview of the 
current state of research on grammar mutations, I col- 
lect all of them in the exhaustive list below. Note that 
conceptually the same mutations may have been ap- 
pearing under different names in various sources: for 
example, the first mutation in the list, "remove all 
terminal symbols" , has previously been known as a 
transformation "stripTs" [LZ09a, §5.3] and as a gen- 
erator ''striptxhgP [ZaylO, §4.9,§4.10.6.1]. 

Remove all terminal symbols [LZ09a; Zayl2h; 
Zayl2i; Zayl2j; ZaylO; Zaylla] 
A simple grammar mutation that is helpful when 
converging a concrete syntax and an abstract 
syntax of the same intended language. While the 
abstract syntax definition may have differently 
ordered parameters of some of its constructs, and 
full convergence will require dealing them them 
and rearranging the structure with (algebraic) 
semantic-preserving transformations, we will 
certainly not encounter any terminal symbols 
and can safely employ this mutation. 

Remove all expression selectors [LZ09a; 
Zayl2h; Zayl2i; Zayl2j; ZaylO] 
Named (selectable) subexpressions are encoun- 
tered in many contexts, but the choice of names 
for them is usually even more subjective than the 
naming convention for the nonterminal symbols. 

Remove all production labels [Zayl2h; Zayl2i; 
Zayl2j] 

Technically, having production label is the same 
as making a selectable subexpression out of the 
right hand side of a nonterminal definition. Still, 
in some frameworks the semantics and/or the 
intended use for labels and for selectors differ. 

Disciplined rename [Zayl2p; Zayl2r; ZaylO; 
Zaylla] 

There are several different well-defined naming 
conventions for nonterminal symbols in current 
practice of grammarware engineering, in partic- 
ular concerning multiword names. Enforcing a 
particular naming convention such as making 
all nonterminal names uppercase or turning 
camelcased names into dash-separated lowercase 
names, can be specified as a unidirectional 
grammar mutation (one for each convention). 

Reroot to top [Zaylla; Zayl2h; Zayl2i; Zayl2j] 
A top nonterminal is a nonterminal that is de- 
fined in the grammar but never used [LVOlb]. 
In many cases it is realistic to assume that the 
top nonterminals are intended starting symbols 
(roots) of the grammar. A variation of this mu- 
tation was used in §3.1 with an additional require- 
ment that a top nonterminal must not be a leaf in 



the relation graph. This is a rational constraint 
since a leaf top nonterminal defines a separated 
component. 

Eliminate top [ZaylO; Zaylla] 

In the situations when the root is known with cer- 
tainly, we can assume all other top (unused) non- 
terminals to be useless, since they are unreach- 
able from the starting symbol and are therefore 
not a part of the grammar. 

Extract subgrammar [Zayl2h; Zayl2i; Zayl2j] 
Alternatively, we can generalise the last muta- 
tion to a parametrised one: given a grammar and 
a nonterminal (or a list of nonterminals), we can 
always automatically construct another grammar 
with the given nonterminal(s) as root(s) and the 
contents formed by all production rules of all non- 
terminals reachable from the assumed root non- 
terminals). Constructing a subgrammar starting 
with the already known roots will eliminate top 
nonterminals. 

Make all production rules vertical [ZaylO; 
Zaylla; Zayl2h; Zayl2i; Zayl2j] 
Vertical definitions contain several alternative 
production rules, while horizontal ones have 
one with a top level choice. There are different 
approaches known to handle this distinction, 
including complete transparency (one form being 
a syntactic sugar of the other). For normali- 
sation purposes or for quick convergence of a 
consistently vertical grammar and a consistently 
horizontal one, we can use this automated 
mutation. 

Make all production rules horizontal [ZaylO; 
Zaylla] 

A similar grammar mutation is possible, yet 
much less useful in practice. 

Distribute all factored definitions [Zayl2h; 
Zayl2i; Zayl2j] 

Aggressive factoring a-la xbgf : distribute can 
also be discussed. Surfacing all inner choices in 
a given grammar is a powerful normalisation 
technique. 

Make all potentially horizontal rules vertical 

[ZaylO; Zaylla] 

Technically, this mutation is a superposition 
of distribution of all factored definition and 
converting all resulting horizontal production 
rules to an equivalent vertical form. 

Deyaccify all yaccified nonterminals [ZaylO; 
Zaylla] 

A "yaccified" definition [LamOl; JMOl] is named 
after YACC [Joh75], a compiler compiler, the 
old versions of which required explicitly defined 
recursive nonterminals — i.e., one would write 
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A : B and A : A B, because in LALR parsers 
like YACC left recursion was preferred to right 
recursion (contrary to recursive descent parsers, 
which are unable to process left recursion 
directly at all). The common good practice 
is modern grammarware engineering is to use 
iteration metalanguage constructs such as B* 
for zero or more repetitions and B+ for one or 
more — this way, the compiler compiler can 
make its own decisions about the particular way 
of implementation, and will neither crash nor 
perform any transformations behind the scenes. 
However, many grammars [Zay+08] contain 
yaccified definitions, and usually the first step 
in any transformation that attempts to reuse 
such grammars for practical purposes, start with 
deyaccification, which can be easily automated. 

Remove lazy nonterminals [ZaylO; Zaylla] 

Many grammars, in particular those that strive 
for better readability or for generality, contain ex- 
cessive number of nonterminals that are used only 
once or chain production rules that are unneces- 
sary for parsing and for many other activities one 
can engage in with grammars. We have used an 
optimising mutation that removes such elements 
with xbgf:inline and xbgpunchain on several oc- 
casions, including improving readability of auto- 
matically generated grammars. 

Normalise to ANF [Zayl2h; Zayl2i; Zayl2j] 

The Abstract Normal Form (ANF) was intro- 
duced in §3.1 as means of limiting the search 
space for guided grammar convergence. Techni- 
cally, such normalisation is equivalent to a su- 
perposition of removing all labels, removing all 
selectors, removing all terminals, surfacing all in- 
ner choices, converting all horizontal production 
rules to a vertical form, rerooting to top non-leaf 
nonterminals and eliminating others unreachable 
from them. For conceptual foundations of ANF 
the reader is redirected to the article where it was 
proposed. 

Fold all grouped subexpressions [Zayl2p; 
Zayl2r] 

In the context of metalinguistic evolution, we 
need to construct a coupled mutation for the 
grammarbase, if the notation change contains 
retiring of a metasyntactic construct that is in 
use. One of such constructs is the possibility 
to group symbols together in an atomic sub- 
sequence — a feature that is often taken for 
granted and therefore misused, improperly doc- 
umented or implemented. Naturally, eliminating 
grouped subexpressions entails folding them 
to newly introduced nonterminals by means of 
xbgf:extract. 

Explicitly encode all separator lists [Zayl2p; 



Zayl2r] 

Our internal representation of grammars for 
software languages, following many other syntac- 
tic notations, contains a construct for defining 
separator lists. For example: {A ","}+ is a 
syntactic sugar for A ("," A)* or (A ",")* A 
— all three variants specify a comma-separated 
list of one or more As. When such a construct 
needs to be retired from the notation, the 
coupled grammar mutation must refactor its 
occurrences to explicitly encode separator lists 
with one of the equivalent alternatives. 

A full fledged paper shining enough light on gram- 
mar mutations, is still being written and will hit the 
submission desks in 2013. 

3.8.2 Iterative parsing 

As the main (intended) contribution of [Zayl2n; 
Zayl2o], I have proposed the algorithm for iterative 
parsing. The basic idea is very simple: we take the 
baseline grammar and skeletonise it as far as it can 
be automated, in such a way that the relation be- 
tween the "lakes" and the nonterminals in the baseline 
grammar are preserved. Then, our parse tree will give 
the basic structure and a number of watery fragments 
parsed with useless lake grammars (usually in a form 
of "anything but newline" or "something in balanced 
out curly brackets"). If needed, any of those lakes can 
be parsed further with a subgrammar of the baseline 
grammar, with the new root being the nonterminal 
that corresponds to the lake. 

This parsing approach was being sold as "parsing 
in the cloud" in [Zayl2n; Zayl2o], which was certainly 
not the best (even though the coolest) way to look at 
it. Other applications for this form of lazy parsing can 
be found in debugging (disambiguation, fault locali- 
sation) and other areas that traditionally profit from 
laziness. This remains future work. 

3.8.3 Unparsing techniques 

One of the most confusing paper that I have submit- 
ted anywhere in 2012, was the one about unparsing 
techniques [Zayl2ai]. Only after finishing writing it, I 
have realised how big and overwhelming this topic is. 
The paper was rightfully rejected after being classified 
as a "request for discussion" : a much deeper survey 
of (some of) the presented topics must be composed 
sooner or later, but it requires much careful consider- 
ation. I have not done much in this topic after that, 
but there was at least one paper published recently 
that explicitly considered unparsing [SCL12]. 

The starting idea is simple as a sunrise: there was 
a lot of effort put in researching parsing techniques, so 
why not the opposite? The unparsing techniques can 
be understood in a very broad sense: pretty-printing, 
syntax highlighting, structural import yielding an ed- 
itable textual representation, bidirectional construc- 
tion of equivalent views, etc. 
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Some papers consider conservative pretty-printing 
as a way to preserve peculiar layout pieces (like mul- 
tiple spaces) during unparsing [Jon02; Ruc96]. This 
is a narrow application of a general idea of prop- 
agating layout through transformations, which is a 
long-standing and a well-researched problem. How- 
ever, even the most conservative unparsers have the 
risk of introducing an inconsistently formatted code 
fragment, if that code was originally introduced by 
a source code manipulation technique and not pro- 
duced by a parser. In other words, replacing a GO 
TO statement with a WHILE loop should look differ- 
ently, depending on how the code around the intro- 
duced fragment was formatted. Possibly, results from 
the grammar inference research field [SC12] can be 
reused for recovering formatting rules in some rea- 
sonable proximity of the code fragment in order to 
unparse it correctly and avoid code alienation. 

Suppose not just one desired textual formatted rep- 
resentation of the language instance exists, but several 
of them, which form a family, or a product line, like 
the line of metalanguages considered in §3.3 in the 
context of metasyntactic evolution. Following that ex- 
ample, suppose we are given a grammar in some inter- 
nal representation and a syntactic notation specifica- 
tion [Zayl2d], then it is somewhat trivial to construct 
an unparser that would produce the same grammar 
in a textual form. In other words, such an unparser 
should generate a text that, given a notation specifi- 
cation, can yield the same grammar after automated 
notation-parametric grammar recovery [Zayl2x, §3]. 
However, other questions remain. How to find a 
minimal notation needed to unparse a given gram- 
mar? How in general to validate compatibility of a 
given grammar and a given notation? How to pro- 
duce grammar transformations (see §3.2) to make the 
grammar fit the notation, how to produce notation 
transformations (see §3.3.2) to make the notation fit 
the grammar, and how to negotiate to find a properly 
balanced outcome? These questions are not trivial 
and require investigation. Unparser-completeness has 
recently been studied in the context of template en- 
gines [ABSll]. 

Unparsing can also be viewed as commitment 
to grammatical structure [KLVOSa]. Can we recover 
grammars from them, compare and converge them 
with other grammars of the same language that we 
would like synchronised (e.g., concrete syntax def- 
inition intended for parsing, multiple abstract syn- 
taxes for performing various grammar-based analy- 
sis tasks, data models for serialisation)? Are there 
some specific properties that such grammars always 
possess? What is the minimal upper formalism for 
the baseline grammar from which grammars for pars- 
ing and unparsing can be derived automatically with 
a language-independent or language-parametric tech- 
nology? These questions are not trivial and require 
investigation. 



Connecting to the topic of robust /tolerant pars- 
ing (see §3.4), we can consider at least two kinds of 
techniques that as the opposite: incremental unpars- 
ing and unparsing incomplete trees. By incremen- 
tal unparsing I mean a modular technique for un- 
parsing modified code fragments and combining them 
with the previously unparsed versions of the unmod- 
ified code fragments. This is usually not considered 
for simple cases, but is possibly worth investigating 
for large scale scenarios (consider architectural modi- 
fications to an IT portfolio with hundreds of millions 
lines of code in dozens of languages). By unparsing 
incomplete trees we define the process of unparsing 
structured representations of incomplete language in- 
stances. Besides scenarios when this technique is used 
together with tolerant /robust parsing (and then the 
lacking information may be somehow propagated to 
the unparser anyway), there are also other scenarios 
when the gaps are deliberately left out to be filled by 
the unparser. In documentation generation, this is 
the way code examples can be treated — for a sample 
implementation we refer to Rascal Tutor [Kli+12]. 

For construction of compiler compilers and simi- 
lar grammarware with unparsing facilities, there is a 
commonly encountered problem of bracket minimal- 
ity for avoiding constructions ambiguous for parsing: 
since brackets are there in the text only to guide the 
parsing process, they are removed from the AST, so 
how to put back as few of them as possible during 
unparsing? This is a typical research question for the 
unparsing techniques field. One could also investi- 
gate various ways to infer grammar adaptation steps 
needed to unparse the given grammatical structure in 
order to guarantee the lack of ambiguities if it is to be 
parsed again. 

3.8.4 Migration to git 

Following the current trend of leaving old-fashioned 
open source farms in favour of more modern 2.0 so- 
cial coding websites, I have migrated the Software 
Language Processing Repository from SourceForge to 
GitHub [Zay+08]. The project was started in 2008 by 
Ralf Lammel [LamOS] and quickly after that become 
the main target for my efforts and the main repos- 
itory for my code. As of now (December 2012), it 
contains 954 revisions committed by me, 314 by Ralf 
Lammel, 44 by Tijs van der Storm and 28 by all other 
contributors combined. 

This would have not been worth mentioning, if I did 
not migrate all my other repositories to git as well, 
which enabled efficient linking to all of them from the 
open notebook (see §3.7). For closed source reposito- 
ries (like ones used for writing papers) we use Atlas- 
sian BitBucket instead of GitHub. 

3.8.5 Turing machine programming 

Two of my colleagues from Centrum Wiskunde & In- 
formatica (CWI), Davy Landman and Jcroen van den 
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Bos, have built a physical Turing machine with a fi- 
nite tape and separate program space, from LEGO 
blocks [Bos+12]. We were all passively yet encourag- 
ingly watching them do that and then watching with 
excitement how the resulting machine could sum two 
and two in less than half an hour. From the soft- 
ware perspective, they have created a kind of "Turing 
assembly" DSL that consisted of commands for ac- 
cessing bits on the tape, moving the head and making 
decisions on the next command, and was translatable 
into some real code that could run on the LEGO chip 
brick. Then, there was a slightly more advanced DSL 
called "Turing level 2" developed on top of it, en- 
hanced with label names and repetition loops, as well 
as IDE support features like a visualiser/simulator. 

My spontaneous contribution to the project in- 
volved writing several programs for the machine in 
this "Turing language level 2" , including copying of 
unary numbers, incrementing them, performing var- 
ious forms of addition and finally multiplying two 
unary numbers. All these programs are publicly acces- 
sible at the official repository: http://github.com/ 
cwi-swat/TuringLEGD/tree/master/excmiples. 

3.8.6 Grammarvifare visualisation 

Various controversial thoughts on grammar recov- 
ery visualisation, related to the previous body of 
works on grammar recovery both (co) authored by 
me [Zay05; LZ09a; ZaylO; Zayllb; LZll; Zayl2d; 
Zayl2x; Zayl2ac] and the giants on shoulders of which 
I was standing [BSV97; LV99; SVOO; LVOla; LVOlb; 
KLV05a], yielded some experimental code, but no 
valuable stable results. 

In a draft sent to the "new ideas" track of FSE 
2012 [Zayl2aj] to be rejected there, I have argued that 
introducing or improving visualisation of processes in 
grammarware engineering has at least these benefits: 

Process comprehension: it becomes easier to un- 
derstand the process and to see what exactly is 
happening when it is applied to certain input. 

Process verification: while complete formal veri- 
fication of a sophisticated process with many 
branches and underlying algorithms, may be a 
challenging task, it is relatively easy to pursue 
lightweight verification methods. One of them 
comes more or less for free when an experienced 
observer can see what is happening and detect 
peculiarities naturally. 

Process improvement: observing a process does 
not only let one find mistakes in it, but also to get 
familiar with bottlenecks and other problematic 
issues, which in turn will help to suggest refine- 
ments and improvements. 

Interactiveness: there are many examples of pro- 
cesses which are impossible or unfeasibly hard to 
automate completely, but for which reasonable 



automation schemes exist that exercise "semi- 
automation" and require occasional feedback 
from a system operator. The request-response 
loop for such feedback can be drastically short- 
ened in the case of interactive visualisations. 

The point of the paper was well-received by the 
FSE NIER reviewers: nobody tried to argue that vi- 
sualisation techniques would be useless. However, I 
obviously overestimated a contribution that I could 
make with providing a "mile wide, inch deep" (a quote 
from one of the reviews) overview, so perhaps a much 
later overview with the list of solid achieved results, 
would be in order. For the sake of completeness of this 
report, I list the nine showcases that were briefly de- 
scribed in the NIER submission below. Each item of 
this list is a relatively low hanging fruit for an article 
or a series thereof. 

Grammar recovery: the state of the art in au- 
tomated grammar recovery (see also §3.3.3) is 
to work based on a set of appropriate heuris- 
tics [LZll; Zayl2x]. Proper visualisation of them 
would help: dealing with some particularly tricky 
notations; verifying that the heuristics do what 
they are intended to do; collecting evidence and 
statistics on the use of certain heuristics; propos- 
ing additional heuristics and other process im- 
provements. 

API-fication is a term used in [KLV05a] to describe 
a process of replacing low level API calls for ma- 
nipulating a data structure with more expressive 
and more maintainable high level API calls gener- 
ated from a grammar [JO04]. Thus, API-fication 
is a form of grammar-aware software renovation 
where surfacing grammar knowledge is a crucial 
contribution of the process. Visualising both the 
API calls themselves and the improvement steps 
on them, can serve as a motivation and even as a 
lightweight verification of API-fication. 

Grammar transformation &; convergence. 

There are at least two commonly used ways 
to visualise a grammar: in a textual form as 
(E)BNF; or as a syntax diagram ("railroad 
track"). Neither of them has a designated 
visualisation notation for transformations. 

Mapping between grammar notations is of the 

biggest challenges in research on grammars in a 
broad sense, since grammarware strives to cover 
such a big range of various structural defini- 
tions. Mapping between EBNF dialects [Zayl2r], 
X/0 mapping [Lam07], 0/R mapping [O'N08], 
R/X mapping [Fer+02] and many other inter- 
notational mappings exist along with intranota- 
tional techniques for grammar diffing, graph com- 
parison, nonterminal matching, model weaving, 
etc. Displaying matching artefacts in a traceable 
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way by metagrammarware tools is usually rather 
limited and either display local (mis)matches or 
global statistics. 

Grammarware coevolution. Concurrent and cou- 
pled evolution of grammars and language in- 
stances [Cic-l-08], of coexisting related gram- 
mars [LamOl], of grammars and language trans- 
formations [CH06], of language design and im- 
plementation [D'H-|-01] are special mixed cases 
of mapping and transformations (see last two sec- 
tions), where we would like to visualise both what 
kind of matches are made and what kind of ac- 
tions are inferred from them. 

Grammar-based analysis comprises syntactic 
analysis (parsing), but also similarly geared 
techniques that never received enough atten- 
tion. As an example, it would be great to 
have something to demonstrate hierarchical 
lexical analysis [MN95] to the same degree as 
[AMUFVI09] demonstrated for LL and LALR 
parsing. 

Disambiguation is a process of filtering a parse for- 
est or reasoning about the origins of it, in modern 
generalised parsing algorithms like SGLR [Vis97] 
or GLL [SJIO]. Visualising SGLR disambigua- 
tion [Bra-l-02] was implemented in the ASF-f SDF 
Meta-Environment as a part of parse tree ren- 
dering, so in fact it visualised the ambiguities 
themselves and not the process of removing them, 
which was still of considerable help. More recent 
GLL disambiguation algorithms [BaslO] were ex- 
pressed mostly in a textual form even within a 
PhD project entirely dedicated to ambiguity de- 
tection [Basil] — primarily because there is no 
clear understanding of how exactly they would be 
useful to visualise. 

Grammar-based testing methods based on com- 
binatorial (non-probabilistic) exploration of the 
software language under test, have emerged from 
recent research [LS06; FLZ12]. Visualising cov- 
erage achieved by them and adjusting the vi- 
sualisation with each new test case should help 
both to keep track of the process by expressing 
its progress, and to localise grammar fragments 
responsible for the failing test cases. 

Grammar inference is a family of methods of infer- 
ring the grammar, partially or completely, from 
the available codebase and even from code inden- 
tation [Cie+05; Nic+07; SC12]. Such inference 
is a complicated process based on heuristics and 
sometimes even on search-based methods. As a 
consequence, each attempt at grammar inference 
remains somehow unconnected to the rest of the 
research field: adoption of such methods by sci- 
entists and engineers outside the original working 



Production rule 


Prod, signature 


p (", Expr, Expri ) 
p (", Expr, str) 
p (", Expr, Exprs) 
p (", Expr, Expr 3) 
p (", Expr, int) 

p (", Function, soq ([sir, * {str) , Expr])) 

p (", Program, * {Function)) 

p (", Expri , seq {[str, * {Expr)])) 

p (", Expr2 , seq {[Ops, Expr, Expr])) 

p (", Expr$ , seq {[Expr, Expr, Expr])) 


{{Expn,!)} 

{{str,l)} 
{{Expr2,l)} 
{{Exprs,!)} 
{{int, 1>} 
{{Expr, 1), {str, 1*)} 

{{Function, *)} 
{{str, I), {Expr, *)} 
{{Ops, I), {Expr, 11)} 
{{Expr, 111)} 



Table 3: The JAXB grammar in a broad sense: in fact, 
an object model obtained by a data binding frame- 
work. Generated automatically by JAXB [FV99] from 
the XML schema for the Factorial Language [Zayl2k]. 



group happens rarely, if ever. One can think that 
a proper visualisation of such process would help 
new users to get acquainted with a grammar re- 
construction system and tweak it to their needs. 

NB: the last item was written before the pub- 
lication of the excellent grammar inference field 
overview [SC12], which can also be seen as considering 
visualisation in a very broad sense. 

Another newer initiative which can be seen as 
grammarware process visualisation, concerns guided 
convergence (see also §3.1). We can recall that the 
whole process of the guided grammar convergence is 
rather complicated and involves normalising the input 
grammar and going through several phases of unifi- 
cation to ensure the final nonterminal mapping that 
looks like this [Zayl2k]: 

jaxb o master — {{Exprs, binary), 

{Expr 3, conditional), 
{int, int), 

{Function , function) , 
{str, str), 

{Program, program), 
{Expr, expression) , 
{Expri, apply), 
{Ops, operator)} 

While preparing the main guided grammar submis- 
sion, I have noticed that this particular mapping, as 
well as the normalised grammar (Table 3) and the 
list of weakly and strongly prodsig-equivalent produc- 
tion rules (Figure 4) can be automatically produced 
by the convergence tool virtually without any addi- 
tional effort in a completely transparent, traceable, 
reliable and reproducible fashion. This led to open 
publication of [Zayl2k], an extended appendix for the 
main guided grammar convergence paper, which was, 
except for the two-page introduction, generated auto- 
matically, but is still readable and useful. 
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p (", Expr, Expri ) 


- P( 


p (", Expr, str) 


- P( 


p (", Expr, Expr2) 


- P( 


p (", Expr, Exprs) 


- P( 


p (", Expr, int) 


- P( 


p (", Function, seq {[str, * (str) , Expr])) 


o P( 


p (", Program, * (Function)) 


p( 


p (", Expri , seq ([sir, * (Expr)])) 


^ P( 


p (" , Expr 2 , seq ( [ Ops , Expr, Expr] ) ) 


p( 


p (", Exprs, seq ([Expr, Expr, Expr])) 


- P( 



expression, apply) 
expression, str) 
expression, binary) 
expression, conditional) 
expression, int) 

function, seq ([str, + (str) , expression])) 

program, + (function)) 

apply, seq ([str, + (expression)])) 

binary, seq ([expression, operator, expression])) 

conditional, seq ([expression, expression, expression])) 



Figure 4: Matching of production rules with the Abstract Normal Form of the JAXB-produced grammar on 
the left and the master grammar on the right [Zayl2k]. 



3.8.7 Wiki activity 

While contributing to wiki websites is not usually con- 
sidered an activity worthy of tracking or mentioning 
in the academic sense, of the 72 wiki-articles I have 
written in 2012 I can identify at least six that can be 
viewed as (popular) scientific writing: 

• Grammar in a broad sense (11 kB + 1 figure) 

• Technological space (16 kB) 

• Megamodelling (6 kB + 2 figures) 

• Island grammar (12 kB) 

• Adriaan van Wijgaarden (21 kB) 

• Ninomiya Sontoku (Kinjiro) (10 kB) 

3.8.8 Colloquium organisation 

Again, participating in organisation of various events 
is commonly considered normal for a practicing aca- 
demic researcher, but is never counted as a scientific 
contribution. Not arguing with that, I am still happy 
to be able to maintain the existing seminar culture 
of CWI (Centrum Wiskunde & Informatica, my cur- 
rent employer) as a colloquium organiser of a series 
of events that have been taken place continuously at 
least since 1997^. Over the course of 2012, 56 pre- 
sentations were given in total as a part of Program- 
ming Environment Meeting (PEM, mostly an inter- 
institutional outlet). Software Engineering Meeting 
(SEM, mostly an internal group seminar) and a spe- 
cial one-day event Symposium on Language Compos- 
ability and Modularity (SLaC'M, most trouble of or- 
ganising which was taken by Tijs van der Storm). 
These speakers have appeared at PEM, SEM and 
SLaC'M in 2012 (in chronological order of their first 
appearance) : 



• Dr. Vadim Zaytsev [Zayl21; Zayl2c; Zayl2ag; 
Zayl2e; Zayl2fl 

• Atze van der Ploeg [Plol2b; Plol2a] 

• Prof. Dr. Serge Dcmcyer [Dcml2] 

• Dr. Alexander Serebrenik [Serl2] 

• Stella Pachidi [Pacl2] 

• Dr. Tijs van der Storm [Stol2a; Stol2c; Stol2b] 

• Michael Steindorfer [Stel2a; Stcl2b; Stel2c] 

• Dr. Antony Sloane [Slol2] 

• Riemer van Rozen [Rozl2a; Rozl2b] 

• Jeroen van den Bos [Bosl2b; LB12; Bosl2a] 

• Alex Loh [Lohl2b; Lohl2c; Lohl2a] 

• Dr. Daniel M. German [Gerl2] 

• Dr. Michael Godfrey [Godl2] 

• Dr. Mark HiUs [Hill2c; Hill2b; Hill2a] 

• Davy Landman [LB12; Lanl2a] 

• Luuk Stevens [Stel2d] 

• Dr. Krzysztof Czarnecki [Czal2] 

• Prof. Dr. Magne Havcraacn [HB12; Havl2] 

• Dr. Anya Helene Bagge [HB12; Bagl2] 

• Dr. Sunil Simon [Siml2] 

• Dr. T. B. Dinesh [Dinl2] 

• Dr. Jurgen Vinju [Vinl2a; Vinl2b] 

• Dr. WiUiam R. Cook [Cool2] 

• Anastasia Izmaylova [Izml2] 

• Dr. Lennart Kats [Katl2] 

• Carel Bast, Wim Bast, Tom Brus [BBB12] 

• Tesfahun Tesfay [Tcsl2] 

• Dr. William B. Langdon [Lanl2b] 

• Andrei Varanovich [Varl2] 

• Dr. Joris Dormans [Dorl2] 

• Sebastiaan Joosten [Jool2] 

• Dr. Magiel Bruntink [Brul2] 

• Dr. Patricia Lago [Lagl2] 

• Prof. Dr. Frank Tip [Tipl2] 

• Dr. Raphael Poss [Post 2] 

• Arjan Schcrpenisse [Schl2] 



^http : //event . cwi .nl/pem 
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4 Venues 

Academic venues (mostly conferences, workshops and 
journals) are essential components of the research pro- 
cess: publishing there means community recognition; 
submitting eventually leads to receiving peer reviews; 
and even reading calls for papers can be very inspir- 
ing and eye-opening. Below I list two kinds of venues 
that contributed to my research in 2012: one list is 
for those where I have submitted, the other one for 
the rest — I am deeply grateful to all the reviewers 
and organisers of both kinds. The lists are not meant 
to cover all possible venues for my field, just those 
directly relevant to my activities this year. 

4.1 Exercised venues 

BX 2012 (ETAPS workshop) 

I have been a ^^bx-curious^^ person for quite a 
while, but BX 2012 was my first venue to come 
out. A very inspiring call for papers'^, excel- 
lent atmosphere during the workshop, friendly 
and productive reviewers. A typical example of 
an event that appreciates you preparing a dedi- 
cated paper for which this becomes the one and 
only target venue. I submitted against all the 
odds (December deadlines are rather stressful), 
got there against all the odds (had to fly from 
ETAPS to SAC and then back) and still regret- 
ted nothing. I will not attend BX 2013 (my 
grandmother has her 80th birthday on the day 
of the workshop, and one has to set priorities), 
but I would if I could. Definitely recommended 
for people at least marginally interested in this 
field [Cza+09]. 

SAC 2012 (PL track) 

A yet another experimental submission in the 
sense that I did not know almost anyone from 
the programme committee at that moment. How- 
ever, I know people from my technological space 
who published there, and the call for papers'* was 
inspiring, so I gave it a try, and did not regret 
it. The whole conference is huge, so I was afraid 
that attending would be unproductive, but I was 
proven wrong: if you know at least a couple of 
people with similar research interests and stick 
to them all the time, you will find many other 
similar researchers to talk to. I did not submit 
anything to SAC 2013 due to bad planning (holi- 
days right before the deadline are unproductive), 
but I definitely will consider it very seriously ev- 
ery year from now on. 

LDTA 2012 (ETAPS worshop) 

Trying to be a good programme committee mem- 
ber, I knew I have to attend, so I have submit- 
ted the best result of 2011 there: the Grammar 

''http : //www .program- transformation. org/BX12 
''http : //www . cis . uab . edu/bryant/sac2012 



Hunter. I was also pleased to see how the current 
call for papers^ positioned LDTA as "SLE, but 
with more grammarware" . The future of LDTA 
remains to be determined, but it has departed 
from ETAPS and will most probably join forces 
with SLE. 

ECMFA 2012 

The call for papers^ made it look like I have a 
chance, so I submitted something that I believed 
to be of good quality and of possible interest to 
the modelware researchers. One of the reviewers 
said that the paper "clearly makes the most con- 
tribution of any paper I read" , which was rather 
encouraging, but ended up with rejection. In the 
end, I must conclude that I should have devoted 
this time to writing for ICPC or one of the journal 
special issues with deadlines around early spring. 

TFP 2012 

The call for papers^ looked challenging, but I re- 
ally liked the "trends" aspect of it, since most tra- 
ditional conferences dislike overview papers un- 
less they are extremely strong and retrospective: 
there is simply no place for overviews of the cur- 
rent trends, unless you are already in the field 
and you systematically explore the "future work" 
sections of all papers you come by. In contrast to 
BX, this was an example of a venue that did not 
appreciate preparing a paper specifically for them 
on a topic relevant to me. In less than two weeks 
after submission I have received a short notifica- 
tion that it was judged to be out of scope. This 
was obviously not the only reason since other 
(stronger, less "trendy") papers from my tech- 
nological space like [SS12] were accepted, so I 
can only conclude that I have failed to explain 
the link between grammar transformation and 
the functional programming paradigm properly. 
Given the fact that I am not qualified to report 
on "trends" in any other field, I doubt that I will 
try sending anything to this venue in the future, 
but I surely do not discourage others to do so. 
Personally for me, it would have been more more 
productive to pursue MoDELS which had a com- 
peting deadline this year. 

JUGS (journal) 

The call for papers^ made it clear that this spe- 
cial issue is linked to a workshop where I did 
not participate, but the call was open, and I an- 
swered. I cannot say that that was very appre- 
ciated: the reviews for [Zayl2ae] came very late 

^http://ldta.inf o/ldta_2012_cfp.pdf 

®http : //www2 . imm . dtu . dk/conf erences/ECMFA-2012/ 
contributions/?page=cf p 

^http : //www-f p . cs . st-andrews . ac.uk/tifp/TFP2012/ 
TFP_2012/CFP12.txt 

*http : / /www. jucs . org/ujs/jucs/inf o/special_lssues/ 
sbcars_cfp.pdf 
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(several months after the notification deadhne), 
were extremely short and discouraging. 

SCAM 2012 

This is the third time 1 have served as a pro- 
gramme committee member for SCAM, where 
I have been invited after our paper with Ralf 
Lammel got a best paper award in 2009 [LZ09b] . 
I have never attended since that time, and re- 
ceived a warning that I will not be included next 
year if I miss the event again. So, putting date- 
conflicting events like SLE and CSMR aside, I 
did my best, which for me meant submitting one 
paper to SCAM and one to the colocated ICSM 
(see below). The topic chosen for SCAM (island 
grammars) seemed to be in scope of the call for 
papers^, but the paper was seen as weird and im- 
mature, and was hopelessly rejected. The reviews 
it received were pretty helpful, even though one 
of the reviewers really hated the "in the cloud" 
aspect (and that is exactly how I tried to sell 
it). Apparently, putting some effort into submit- 
ting something has already been noticed, since 
I have been, against all the odds, invited to the 
programme committee again for SCAM 2013. 

ICSM 2012 

The call for papers^° came to my attention right 
after the rejection letter from ECMFA, and I 
decided that ICSM would be a good venue for 
the guided grammar convergence methodology 
(§3.1). Getting a paper there would also increase 
my chances at going to SCAM (see above for the 
reasons). Reviews were rather cold, but some of 
them (except one) useful nonetheless. 

NordiCloud (WICSA/ECSA workshop) 

Not really being an architecture researcher, 
I would have never considered going to 
WICSA/ECSA, but the call for contributions" 
was out precisely a couple of weeks after my 
SCAM rejection, and I was not feeling enough 
energy to rewrite the island parsing paper com- 
pletely, so NordiCloud was a relatively cheap way 
for me to resubmit the same material after a mi- 
nor revision. It did not pay off: most of the re- 
viewers were scared off just by seeing a grammar- 
related submission. 

FSE 2012 (NIER track) 

The call for papers came out at a very busy 
time, but four page limit was easily reachable, 
so I have submitted two papers on different new 
ideas. Unfortunately, they were indeed more of 
idealistic proposals for discussing and considering 

^http : //scam2012 . cs . usask . ca/CFP 

l°http : //selab . f bk . eu/icsm2012/download/cf p- icsm2012 . 
pdf 

l^http : //46 . 22 . 129 . 68/NordiCloud/?page_id=39 
^^http : //www . sigsof t . org/f se20/cf pNewIdeas .html 



certain aspects, than usual "short papers" that 
are just normal papers at the early stage. Both 
were hopelessly rejected, and I still want to find 
some venue for the future that would be good 
for sharing and discussing fresh ideas — perhaps 
OBT? I have to try to find out. 

SoTeSoLa 2012 (summer school) 

An experiment in "Research 2.0" driven mostly 
by Jean-Marie Favre and Ralf Lammel, this sum- 
mer school was by far not a typical one. There 
was a lot of innovations: submitting a one-page 
profile of yourself, making a one-minute video 
about yourself, listening to lots of remote lec- 
tures, having a hackathon distributed in time and 
space, registering at a social networking website, 
etc. Not all of them very entirely successful: 
partly due to being ahead of its time, partly due 
to other reasons, which are being dissected, anal- 
ysed and researched now by Jean-Marie Favre. I 
was involved in all kinds of activities from the rel- 
atively early stage, and in the end it was officially 
classified as serving as a "Social Media Chair" 
and a "Hackathon Lead Coordinator" . This was 
not a publishing venue, and I did not give any 
invited lecture, but it was fun to be a part of it. 

SATToSE 2012 (seminar) 

A non-publishing seminar series where I have 
given a presentation on bidirectional grammar 
transformation [Zayl2b]. The material presented 
there was in a state somewhere between [Zayl2r] 
and the planned future paper on bidirectionalisa- 
tion. 

POPL 2013 

The call for papers^^ was concise and crunchy, 
but POPL is one of the venues that does not re- 
quire much advertisement. I have poured a lot 
of effort into [Zayl2j], completely redesigned the 
convergence process (see §3.1), reimplemented 
the prototype and rewritten the paper with re- 
spect to [Zayl2h; Zayl2i]. In a way, it did pay 
off: the paper was rejected, but the reviews were 
among the most useful that I have received this 
year. 

NWPT 2012 

The call for papers^"* was brought to my atten- 
tion by Anya Helene Bagge, a co-organiser of this 
workshop. In an extended abstract that was sub- 
mitted there, I apparently went overboard with 
the required abstraction level and assumed level 
of grammatical knowledge, and recent POPL re- 
jection has possibly jeopardised the outsourcing 
of the usefulness statement of the method. Re- 
views were curt and bleak. 

l^http : //popl .mpi-sws . org/2013/popl2013- cf p . pdf 
^*http : / /nwptl2 . ii .uib.no/call-for-papers 
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EMSE (journal) 

The call for papers^^ called for "experimental 
replications" and went to great length explain- 
ing how important it is to be able to publish not 
just the experiments themselves, but also replica- 
tions thereof. I was immediately convinced, but 
decided to reinterpret the definition of a replica- 
tion. Instead of doing classical empirical stud- 
ies, I presented research activities (and in partic- 
ular prototype engineering) as experiments. That 
way, the replications were also "experiments" in 
that sense that were intended to cover an older 
experiment and could therefore be measured and 
assessed based on grounds of that coverage. I 
could even find some related work on the topic in 
form of papers that described the prototype de- 
velopment process itself. My paper was intended 
to contain three case studies: (1) replicating the 
grammar convergence case study of the Factorial 
Language from [LZ09a] with the guided grammar 
convergence methodology (see §3.1); (2) replicat- 
ing a bigger grammar convergence case study of 
Java from [LZll] with more abstract and con- 
cise Extended XBGF (see §3.2.4); (3) replicating 
both of these case studies with a bidirectional 
EBGF (see §3.2.2). Due to insane amounts of 
work that this turned out to be, only the first 
two replications have made it into a 42-page long 
paper [Zayl2m]. Only one of three reviewers was 
excited by my approach, and all three agreed that 
the empirical software engineering journal is not 
the right venue for such a report. 

MPM 2012 (Models worshop) 

Basically, this venue was chosen after I have writ- 
ten a paper. The text underwent some polishing 
after the choice was made, but the topic was not 
adjusted. I have had a nice idea of transforming 
megamodels in order to make a good story out of 
them (see §3.5): a substantial contribution was 
not yet there (and such work is still ongoing) , but 
I wanted to expose it to the public and to discuss 
it first. The call for papers^^ for MPM looked 
the most inviting for this kind of cross-paradigm 
approach among all MoDELS workshops, and in- 
deed the reviewers found the paper weird yet ac- 
ceptable, so I was able to give a short presentation 
and hang my poster there [Zayl2aa]. 

XM 2012 (Models worshop) 

The topics list^^ provided by the organisers of 
this workshop was fascinating, and I desperately 
wanted to submit anything, but eventually gave 
up to find the time it deserved. Soon after that, 
the deadline was extended, and I had no other 

^^http : //sequoia. cs .byu . edu/lab/?page= 
reser2013&section=emseSpeclalIssue 

l^http : //avalon . aut . bme . hu/mpml2/MPM12- CFP . pdf 
I'^http : //www . di . univaq. it/XM2012/page . php?page_id=13 



choice than to write down the idea that was float- 
ing around in my head for a while (see §3.2.3). 

SCP (journal) 

The call for systems^* was very much in sync with 
what its guest editors have tried to achieve in the 
last years, and I support them wholeheartedly in 
that. The Grammar Zoo, one of essential parts of 
the SLPS [Zay+08], that did not receive a lot of 
my attention in 2012, but that was always on my 
mind, was packaged and submitted there both as 
an available system and as an important repos- 
itory of experimental systems in grammarware. 
The outcome will become known in 2013. 

4.2 Inspiring venues 

There have been many venues that I did not submit 
anything to, but not because I did not want to. Their 
calls for papers gave me inspiration to work on some- 
thing, even though I was not productive enough to be 
able to fit into their deadlines or produce anything of 
value at the required level. 

MSR 2012 

The mining challenge^^ of MSR looked very in- 
teresting, so I looked at it, but since 1 was looking 
specifically for grammars, it did not work out at 
all: only two grammars were found, and there 
was no sensible way to connect them to the rest 
of the system. If more of them could have been 
obtained written in a variety of EBNF dialects, 
it could have become an interesting case study 
similar to [Zayl2ac]. 

Laws and Principles of Software Evolution 

The call for papers^° for this special issue of 
JSME looked tempting, so I even emailed the 
editors, asking for some additional information. 
Unfortunately, the collaboration that I hoped to 
achieve with other people, did not work out, and 
nothing was produced in time. 

Success Stories in Model Driven Engineering 

The call for papers^ ^ came out at the time when 
I was busy with all kinds of other initiatives. Be- 
sides that, this special issue of SCP was actually 
looking for extended reports on already published 
projects, and I was busy with new experiments. 
Possibly, a strong "lessons learnt" kind of paper 
on grammar hunting would make sense, but I was 
too immersed in new stuff at the time to go back. 
However, I have to admit that when/if I finally 
sit down to write a comprehensive grammar re- 
covery paper (i.e., connecting §§3.3.3, 3.8.1, 3.8.6 
and 3.6), it must go to either SCP or SP&E. 

l^http : //www . win . tue . iil/~mvdbr and/SCP-EST 
^^http : / /2012 .msrconf . org/challenge .php 
^''http : / /listserv . acm. org/scripts/wa-acmlpx . exe?A2= 
indllll&L=seworld&F=&S=aP=25841 
^^http : //www. di .univaq. it/ssmde 
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CloudMDE (ECMFA 2012 workshop) 

This was the venue that gave me the eerie thought 
of writing a "parsing in the cloud" paper (see 
§3.8.2). However, I was disheartened by the re- 
jection of [Zayl2h] at ECMFA and decided to not 
submit anything to ECMFA workshops^^. 

ICPC 2012 

The call for papers^^ competes date-wise with 
many other good venues, so this year ICPC just 
happened to not be among the ones I have chosen 
as my targets. 

RC 2012 

The call for papers^'' for the fourth workshop on 
reversible computation gave me a lot of ideas and 
keyword pointers for the bidirectionality topic. 
However, I did not feel confident enough to sub- 
mit anything. Anyway, thanks a lot and congrat- 
ulations on becoming a conference in 2013! 

CSCW 2013 

This is not a typical venue for me, but I have a 
dream of eventually submitting something wiki- 
related there. The call for participation^^ was as 
good as it always is, and even better this year be- 
cause they have introduced a new rule concerning 
the paper size: 10 pages is no longer the limit, it 
is rather a standard. If your idea fits on smaller 
number of pages, your reviewers have the right 
to complain if you try to bloat your submission. 
On the other hand, if that is not enough, you can 
always make your paper longer, but the contri- 
bution then needs to grow accordingly. I believe 
that with small incremental and non-disruptive 
ideas like these, we could achieve modern com- 
fortable publishing models easier than with en- 
deavours to revolutionise the field. 

PPDP/LOPSTR 2012 

The calls for papers^^'^^ were both interesting, 
but at my level I could not actually decide be- 
tween the two venues. I was working honestly to- 
ward the seemingly achievable goal (see §3.2.5), 
but it turned out to be unachievable. Being in- 
secure about my ability to write a strong paper 
about negative results, I gave up. 

FM+AM 2012 (SEEM workshop) 

This was the workshop^^ that set my thoughts 

^^V. Zaytsev (grammarware) . "Yet another bridging at- 
tempt failed: my grammar paper got rejected at @ccmfa2012. 
Now I will also go submit the #CloudMDE draft else- 
where." Tweet, https://twitter.com/grcmraiarware/status/ 
189976445995593728. 11 April 2012, 9:21. 
^^http : //icpcl2 . sosy-lab . org/Cf P . pdf 
■^■'http : //www . reversible- computation, org/2012/ 
index91bl .html?call_f or_papers 
^^http : //cscw. acm. org/participation_paper . html 
^^http : //dtai . cs . kuleuven.be/events/PPDP2012/ 
ppdp-cfp.txt 

^^http : //costa. Is . f i .upm. es/lopstrl2/cf p .pdf 
^^http: //ssfm. cs . up . ac . za/workshop/FMAM12 . htm 



in the agile/extreme mode, which ultimately led 
to the paper at XM (see §3.2.3) simply because I 
did not manage to complete the work before the 
FM-I-AM deadline. Imagine my surprise when I 
found out that FM-fAM was cancelled due to the 
lack of good submissions! 

WoSQ 2012 (FSE workshop) 

The call for papers^^ has led me to believe that 
this would be a good possible venue for the pa- 
per on grammar mutations (see §3.8.1). However, 
the time was too tight, and both of my NIER sub- 
missions have been rejected, so an FSE workshop 
stopped looking that attractive after all. 

SQM 2012 (CSMR workshop) 

The workshop''" happened at the same time as I 
was attending both ETAPS and SAC, so I could 
not possible be at the third place at the same 
time as well, but I just want to name it as a rela- 
tively small venue where I have enjoyed reviewing 
a couple of papers as a PC member (will be on 
PC next year as well). 

WCN 2012 

The website^^ is in Dutch, as the conference itself. 
This was my second experience being a Program 
Chair (the first one was with WCN 2011), and 
this time I counted: 842 emails needed to be sent 
or answered by me in order for this conference 
to happen. Luckily, CWI (my employer) did not 
mind since they could proudly list "one of theirs" 
to be the PC at a venue where one of the keynote 
speakers is Jimmy Wales [Wall2]. 

5 Concluding remarks 
5.1 Immediate results 

Writing this extended year report has achieved at least 
three goals: 

Streamlining new ideas. Reexplaning (renar- 
rating?) research ideas and putting them in 
perspective has helped to crystallise them into 
publishable achievable objectives. 

Knowledge dissemination. This document can 
serve both as a scientific report for my colleagues 
and superiors, and as an entrance point for peo- 
ple who want to get acquainted with my results 
for other reasons. 

Case study in self-archiving. As I have said in 
the introduction, this report can be seen as ad- 
vanced form of self-archiving. It was a relatively 
big effort, compared to the traditional "just put 
the PDF online" thing. Together with the open 
notebook initiative, it stressed the paradigm and 

■^'^http : / /sites . google . com/site/wosq2012/cf p 

^"http : //sqm2012 . sig . eu 

^^http : / /www . wikimediaconf erentie . nl 
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raised some questions yet to be answered (e.g.: 
How to properly break one atomic SKO — es- 
sentially, a publication — into subatomic ones to 
distinguish "I've done for the tool that was later 
described in this paper" from "I've done this for 
the particular version of this paper, which was 
later rejected"? What are all possible stages in 
the SKO lifecycle?). 

5.2 Special features 

The presence of an open notebook. A lot of 

claims about dates, continuations and amounts 
of effort, made on the pages of this report, can be 
reformulated into queries on the open notebook, 
and formally validated as such. For now these 
claims were intentionally done in plain text 
because no reliable or traditionally acceptable 
infrastructure exists for them (yet). 

Open access window. All the papers mentioned 
here, were put online immediately after their sub- 
mission (unless prohibited explicitly by the sub- 
mission rules), and taken down immediately af- 
ter their rejection (if any). At this stage, I do 
not know any better way to expose your research 
results to the public: official acceptance can take 
months and years, during which one could have 
profited from sharing the contents around. 

Rejected material. Not all rejected papers are re- 
jected because they are inherently, irreparably 
bad: some turn out to be out of scope, lack- 
ing some essential results or simply not mature 
enough to be published (yet). With this report, 
I have exposed most of the dark data concerning 
my rejected material. 

Unfruitful attempts. Also classified as dark data 
by [Goc07], but of an entirely different nature: 
these are failed experiments: prototypes that 
have never made it to the point of being ready 
to be described in a paper. There can be traces 
of such unfruitful attempts in presentations and 
other subatomic SKOs before their futility be- 
comes apparent. 

Venues. Knowledge about workshops, conferences 
and journals seems to float around in the aca- 
demic community and is usually distributed as 
folklore, if at all. There are many reasons for do- 
ing so, ranging from the lack of incentive to the 
fear of occasional offence. 
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